This is the project notebook for the Invasion Heterogeneity project, which uses Jennifer Williams’ Arabidopsis experiments to motivate explorations of the effects of heterogeneity, stochasticity, and evolution on the variation in spread rate.
This notebook is prepared in bookdown. I ran into a number of poorly documented issues.
knit: "bookdown::render_book", site: "bookdown::bookdown_site", and output: bookdown::gitbook are in the yaml header (either in index.Rmd or, presumably, _bookdown.yml)Individual content files should be named with the date, in format yyyy-mm-dd.Rmd. The first line should also include the date, in the format, e.g., # (PART) 7 April 2017 {-}. This creates a “part” with the date, and all subjects within the day’s work can be top-level headers.
Had a Skype chat today with Jenn, to get back on track with this project.
We recalled the work I did on the Ler populations in July 2015, which showed that:
Jenn reminded me that the variation in spread was higher in the the control than in the evolution treatments for three of the four landscapes (there’s a figure in the appendix of the Science paper). However, that comparison was based on the CV (and that might have just been evaluated at the end of the experiment; I don’t recall whethet we looked at each generation’s spread); it might be that the patterns in total variance are different, at least in the fragmented landscape where there was a big difference in mean spread rates.
I hypothesized that among-individual heterogeneity in the dispersal kernel, like variation in growth rate in the IPM, generates population level effects that don’t disapear as the population size gets large. This comes from Joe’s work in deterministic settings; the question is then, in a stochastic setting, how does the mean and variance of spread depend on individual heterogeneity?
I also wondered how iid variation in spread would translate into increasing variance among replicates over time. Is it like a random walk, with variance increasing linearly through time? This would be a “random ratchet.” It seems as though the strict non-negativity of the change shouldn’t affect the general result that the variance in total distance spread should just be the sum of the per-generation variances, so that inter-replicate variance would grow linearly with time.
The result for the evolution experiment suggests that the het created by genetic variation is increasing stochasticity of some sort, as for iid het and Poisson variation.
Central question: What are the drivers of variation in spread rate, and are they predictable?
We discussed the data. There are some potentially useful observations (such as plant height in the Ler experiment) that I’ve not seen before. Jenn gave me access to all of her data on Dropbox.
We also discussed temporal variation. Plant performance is probably best in spring and autumn (the greenhouse is too hot in summer), but there were also some pathogen/pest outbreaks that were not seasonal. To the extent that the latter affected the whole greenhouse, we can just use a generation fixed effect.
I wonder if we know the season of the dispersal experiments?
I’m creating this notebook in bookdown. I ran into a number of poorly documented issues.
knit: "bookdown::render_book", site: "bookdown::bookdown_site", and output: bookdown::gitbook are in the yaml header (either in index.Rmd or, presumably, _bookdown.yml)Individual content files should be named with the date, in format yyyy-mm-dd.Rmd. The first line should also include the date, in the format, e.g., # (PART) 7 April 2017 {-}. This creates a “part” with the date, and all subjects within the day’s work can be top-level headers.
As the notebook gets longer I’ll want to start making good use of preview_chapter(), but it does not (by default) display the page. I’ll need to figure out how to get that working…
I’ll also want to set up a template file, especially once I decide what needs to go into the preamble for R code execution.
I need to look through the bookdown documentation to see what else I can set in _bookdown.yml. A good place to start is the Configuration chapter of Authoring Books with R Markdown. It may be that I can put the index.Rmd file in the project root, and the entries in a subdirectory specifed by rmd_subdir, and have the working directory for the Rmd files be the project root.
Just for grins, let’s see what the working directory is set to:
getwd()
## [1] "/Users/Bkendall/Documents/github-bitbucket/InvasionHet/notebook"
notebook) it gives the notebook directory.index.Rmd and _bookdown.yml to the project root, and adding rmd_subdir: true to the latter, I get the project root as the working directory. This is good! However, it appears that this will lead to Rmd files in all subdirectories getting included. Unless I can use regular in the rmd_files option this will be a dealbreaker eventually. I haven’t found any way to do the latter.Let me try moving everything back to the notebook directory and applying setwd():
setwd("..")
getwd()
## [1] "/Users/Bkendall/Documents/github-bitbucket/InvasionHet"
This seems to work.
I won’t want R code from old entries re-running, not just for time but because data and functions may have changed. I think the key here is to:
new_session: yes in _bookdown.ymlBut this is incompatible with having Rmd files in subdirectories!!!!
In principle, it may save compilation time to have files without R code simply be md files. But it seems that only Rmd files are searched by default; md files would need to be added by hand to rmd_files.
I also set up the directory structure using ProjectTemplate. The one tricky thing is that Rmd files in this notebook (and in the reports directory as well) may have to chdir to the root, which might be tricky. If I keep the index file in the root as described above then this shouldn’t be a problem.
First, here is a draft of the R header chunk that I will want to include in every post that uses R (silently):
knitr::opts_chunk$set(comment = '', cache = TRUE)
knitr::opts_knit$set(root.dir = "..")
I’m sure I’ll come up with some more.
Now, on to how to load the Arabidopsis data using ProjectTemplate. I started with the Ler population data, which is in multiple xlsx files, with one sheet having data, one having metadata, and potentially some blank sheets as well. If I follow the standard protocol for ProjectTemplate and drop them into the data directory, then when I execute load.project() I get a data object for every sheet of every file (because there is no way to pass options to read.xlsx(). In addition, now that they are loaded as data objects, it is hard to access them in a loop to build a the combined datasets. Finally, the workspace is cluttered with all of these objects; if I delete them as part of the data munging, then they will get reloaded the next time I use load.project(), which is time consuming!
So instead I took the data files out of the project directory, and used the option to have a .R file in the data directory. Using this, I can re-use my existing code to read the files in from the Dropbox directory and manipulate them as I want, deleting the intermediate objects and just keeping the final objects. Furthermore, if I then cache the objects, then subsequent loads are fast!
So here’s the file:
cat(readLines('data/popLer.R'), sep = '\n')
### Creates the data objects popLer and popLer_cm, representing the Ler populations
### experiments
# Raw data consists of one file per generation
# Final column is named inconsistently, so needs to be corrected before merge
data_dir <- "~/Documents/Dropbox/Arabidopsis/Data/Exp1"
data_fname <- list.files(data_dir, "seedposition")
popLer_cm <- NULL
for (i in 1:length(data_fname)) {
tempdata <- xlsx::read.xlsx(file.path(data_dir, data_fname[i]), sheetName = "Data",
header = TRUE)
names(tempdata)[9] <- "seedlings"
popLer_cm <- rbind(popLer_cm, tempdata)
rm(tempdata)
}
# Clean up column names and get a useful order
popLer_cm <- popLer_cm[-1]
names(popLer_cm) <- c("ID", "Treatment", "Rep", "Gap", "Generation", "Pot", "Distance",
"Seedlings")
ord <- with(popLer_cm, order(Treatment, Gap, Rep, Generation, Pot, Distance))
popLer_cm <- popLer_cm[ord,]
# Make a version that just has pot totals
require(plyr)
popLer <- ddply(popLer_cm, .(ID, Gap, Rep, Treatment, Generation, Pot), summarize,
Seedlings = sum(Seedlings))
ord <- with(popLer, order(Treatment, Gap, Rep, Generation, Pot))
popLer <- popLer[ord,]
Now let’s look at what we get:
ProjectTemplate::load.project()
Loading project configuration
Autoloading helper functions
Running helper script: helpers.R
Autoloading cache
Loading cached data set: popLer.cm
Loading cached data set: popLer
Autoloading data
Munging data
Running preprocessing script: 01-A.R
ls()
[1] "config" "helper.function" "popLer" "popLer_cm"
[5] "project.info"
head(popLer)
ID Gap Rep Treatment Generation Pot Seedlings
359 21 0p 1 A 1 0 132
360 21 0p 1 A 1 1 21
361 21 0p 1 A 2 0 74
362 21 0p 1 A 2 1 72
363 21 0p 1 A 2 2 26
364 21 0p 1 A 2 3 2
head(popLer_cm)
ID Treatment Rep Gap Generation Pot Distance Seedlings
40 21 A 1 0p 1 0 4 132
42 21 A 1 0p 1 1 9 5
43 21 A 1 0p 1 1 10 6
44 21 A 1 0p 1 1 11 3
45 21 A 1 0p 1 1 13 4
41 21 A 1 0p 1 1 14 3
Looking good!
This has the added advantage that any changes in the data that Jenn makes can be easily loaded by deleting the cache; and that I can share the project on a public github directory (and Jenn can use it as long as we have our Dropbox in the same place).
I have linked the project to a github repository. I’ve excluded from the repository the .Rdata files in cache (which suggests that I should put the cache() commands in the file generating the data, otherwise I need to remember to do them by hand on a new computer) and the notebook/_book directory, which has files created when I build the notebook. I did retain notebook/_bookdown_files, as that is where the caches for the notebook pages are stored. This means that I am saving binary objects (e.g., .Rdata files), but they shouldn’t get updated unless I modify code blocks in past journal entries.
The file notebook/_main.rds also needs to be ignored, as it is regenerated by bookdown each time. Unfortunately it has already been checked in; I need to figure out how to remove it. A quick google reveals that running git rm --cached notebook/_main.rds from the command line does the trick—and I’ve done it.
I’m now experimenting with a new format for the notebook where the “chapter” is the date and the days entries are a second level head. Let’s see how that goes.
I’ve just checked on Jenn’s repository, and it looks like she is using Dropbox directly under her home directory (i.e., file paths read ~/Dropbox). Right now my Dropbox folder is inside ~/Documents (which I think might be the default setup for mac). I have just moved my Dropbox folder up a level (using preferences in the Dropbox app)—I hope that doesn’t break anything else!
I’ve updated popLer.R to:
By deleting the cached .Rdata files and re-running load.project() I confirmed that this all works as expected.
There are two additional datasets from the Ler population, which I have not previously utilized. One is a count of siliques in each pot from treatments A and B, in each generation. The count from treatment B will allow me to look at variation in silique production in isolated plants, and relate siliques to effective seed number.
The other is measurements of plant heights in all pots in generation 5. There are two measurements for each pot: the tallest individual and the height of “the crowd”, which is an effort to estimate typical height in the pot. For treatment B, I can relate height to silique production. I might also be able to use treatment B to look for additional effects of height on dispersal, by simultaneously modeling dispersal from all parent plants and incorporating a height term in the dispersal parameters. In treatment C, I can look for a relationship between the two height measures and density.
As a reminder, here are the treatments:
As far as I understand it, the seedling counts are before any thinning occurs.
This is documented in ~/Dropbox/Arabidopsis/protocols/Exp1.
I have created two data objects. popLer_cm records position along the runway to the nearest centimeter for newly colonized pots (it looks like in gen 2 it was recorded for all pots but that did not recur). popLer simply records number by pot.
As far as I can tell/recall, there was no separate dispersal experiment for Ler. So the dispersal kernels come from the first generation of continuous runways.
Let’s look again at Ler seed production, using generation one data.
First, extract those data for the relevant treatments:
Ler1BC <- subset(popLer, Gap=="0p" & Generation == 1 & Treatment != "A")
Ler1BC$Density <- 1
Ler1BC$Density[Ler1BC$Treatment == "C"] <- 50
Next step: figure out how to use dplyr to get the effective seed number.
I turned on library loading, but for now just have dplyr and ggplot2
(Picking up from Friday)
Let’s look again at Ler seed production, using generation one data.
First, extract those data for the relevant treatments:
Ler1BC <- subset(popLer, Gap=="0p" & Generation == 1 & Treatment != "A")
Ler1BC$Density <- 1
Ler1BC$Density[Ler1BC$Treatment == "C"] <- 50
WARNING
The code that follows calculates effective seed number incorrectly. See the correct version of the analysis in the entry for April 21. (21-04-17)
We calculate the “effective seed number” as the number falling in the mother pot plus twice the number falling in the other pots (the latter gets at bidirectional dispersal, and estimates the number of seeds that would fall somewhere on the runway if the home pot was in the middle of the runway):
seedlings1 <- group_by(Ler1BC, ID, Density) %>%
summarise(eff_sd_no = sum(Seedlings) + sum(Seedlings * Pot > 0))
Plot the distributions:
ggplot(seedlings1, aes(x = eff_sd_no, fill = as.factor(Density))) +
geom_histogram(binwidth = 50)
(of course, stacked histograms aren’t great, but it gives the general idea).
Now look at the mean and variance across pots:
kable(group_by(seedlings1, Density) %>%
summarise(Mean = mean(eff_sd_no), Variance = var(eff_sd_no)),
caption = paste("Mean and variance across pots of effective seed number",
"in treatments B and C of Ler generation 1")
)
| Density | Mean | Variance |
|---|---|---|
| 1 | 108.4 | 1020.267 |
| 50 | 261.4 | 10440.489 |
We can see a number of things from this:
It would be great to have more info on silique number and on height. However, we only have silique number for treatment B (solitary plants); the measurement for treatment A seems to be after thinning. And height is only available in generation 5, at which point it’s difficult to get unambiguous seed production/dispersal estimates (we can, however, look at non-dispersed seed production in the isolated pots).
Some additional analyses to do:
Other thoughts:
Realizing that I need to re-use the seedlings1 data from last time, I see that I should start to use the munge directory for key derived data. I therefore ned to give it a more meaningful name; how about Ler_seeds_gen1. I’ll also add some more information about home and away
The code for doing this is in munge/01-Ler-seed-gen1.R
In the process of doing this I discovered that last time I failed to double the dispersing seeds in the effective seed number calculation. So let’s redo the graph and table.
Plot the distributions:
ggplot(Ler_seed_gen1, aes(x = eff_sd_no, fill = as.factor(Density))) +
geom_histogram(binwidth = 50)
(of course, stacked histograms aren’t great, but it gives the general idea).
Now look at the mean and variance across pots:
kable(group_by(Ler_seed_gen1, Density) %>%
summarise(Mean = mean(eff_sd_no), Variance = var(eff_sd_no)),
caption = paste("Mean and variance across pots of effective seed number",
"in treatments B and C of Ler generation 1")
)
| Density | Mean | Variance |
|---|---|---|
| 1 | 133.9 | 3443.433 |
| 50 | 347.6 | 21362.933 |
This changes the quantitative mean:variance ratios:
Now lets look at how well home seeds predicts total effective seeds:
qplot(home, eff_sd_no, data = Ler_seed_gen1,
group = as.factor(Density), colour = as.factor(Density),
xlab = "Number of seedlings in home pot",
ylab = "Effective seed number"
) +
geom_smooth(method = "lm")
summary(lm(eff_sd_no ~ home * as.factor(Density), data = Ler_seed_gen1))
Call:
lm(formula = eff_sd_no ~ home * as.factor(Density), data = Ler_seed_gen1)
Residuals:
Min 1Q Median 3Q Max
-140.99 -40.94 -7.44 24.48 138.17
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 82.2439 80.5487 1.021 0.322
home 0.5513 0.8223 0.670 0.512
as.factor(Density)50 -37.4497 104.2092 -0.359 0.724
home:as.factor(Density)50 0.8532 0.8708 0.980 0.342
Residual standard error: 74.29 on 16 degrees of freedom
Multiple R-squared: 0.8045, Adjusted R-squared: 0.7678
F-statistic: 21.94 on 3 and 16 DF, p-value: 6.482e-06
There’s no evidence for differences among the two pots, so fortunately we can merge them:
qplot(home, eff_sd_no, data = Ler_seed_gen1,
xlab = "Number of seedlings in home pot",
ylab = "Effective seed number"
) +
geom_smooth(method = "lm")
summary(lm(eff_sd_no ~ home, data = Ler_seed_gen1))
Call:
lm(formula = eff_sd_no ~ home, data = Ler_seed_gen1)
Residuals:
Min 1Q Median 3Q Max
-139.659 -44.354 -4.624 31.966 147.853
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.9745 34.2927 0.116 0.909
home 1.5310 0.1935 7.911 2.87e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 74.86 on 18 degrees of freedom
Multiple R-squared: 0.7766, Adjusted R-squared: 0.7642
F-statistic: 62.58 on 1 and 18 DF, p-value: 2.873e-07
Fortunately the relationship is linear and independent of density (although with only two density treatments these conclustions are not super robust). And while we explain 78% of the variance there is still some scatter: the home pot seeds represents “measurement error” relative to the effective seed number (although the effective seed number itself seems pretty stochastic, as the fraction of seeds dispersing is pretty stochastic). However, it’s the best we have.
We use the 3-pot gaps, looking at seeds in the home pot. The rationale is that a trivial number of seeds will be arriving from other pots.
Ler3pBC <- filter(popLer, Gap == "3p", Treatment != "A")
Ler3pBC <- group_by(Ler3pBC, Treatment, Pot, Rep) %>%
mutate(Adults = lag(Seedlings))
Ler3pBC <- within(Ler3pBC,
{
Adults[Treatment == "B" & !is.na(Adults)] <- 1
Adults[Pot == 0 & Generation == 1] <- 1
Adults[Treatment == "C" & Pot == 0 & Generation == 1] <- 50
})
I couldn’t figure out how to do it all in dplyr!
I think there’s something wierd going on in generation 6, let’s look at treatment B:
filter(Ler3pBC, Treatment == "B") %>%
ggplot(aes(x = as.factor(Generation), y = Seedlings)) + geom_boxplot(notch = TRUE)
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
Actually, it’s the first and last generations that are odd: Gen 1 is unusually productive, and Gen 6 is unusually underproductive.
Let’s look at the density dependence:
qplot(data=Ler3pBC, x = Adults, y = Seedlings/Adults,
colour = as.factor(Generation), log = "xy") +
geom_smooth(method = "lm")
Warning: Removed 15 rows containing non-finite values (stat_smooth).
Warning: Removed 15 rows containing missing values (geom_point).
Generation 6 looks really different.
It’s a bit less pronounced if we leave out treatment B, but it’s still there:
qplot(data=filter(Ler3pBC, Treatment == "C"), x = Adults, y = Seedlings/Adults,
colour = as.factor(Generation), log = "xy") +
geom_smooth(method = "lm")
Warning: Removed 9 rows containing non-finite values (stat_smooth).
Warning: Removed 9 rows containing missing values (geom_point).
So I think we want to leave out gen 6.
Note that there looks to be some curvature in the log-log plot, suggesting that the log tramsform of Adults is too strong. Need to estimate a power transform instead, or the Wilkinson model.
First we look at furthest distance reached.
Ler_furthest <- group_by(popLer, Treatment, Gap, Rep, Generation) %>%
summarise(Furthest = max(Pot))
ggplot(Ler_furthest, aes(x = Generation, y = Furthest, color = as.factor(Rep))) +
geom_line() + facet_grid(Treatment ~ Gap)
Hmm, I’ve uncovered that popLer has a line of all NA’s (and popLer_cm has three)! I’ve gone ahead and updated popLer.R (and added plyr to the set of autoloaded packages) to fix this. Now I don’t have extra rows/columns in the facet table!
Now let’s look at a threshold:
threshold <- 100
Ler_thresh <- filter(popLer, Seedlings > threshold) %>%
group_by(Treatment, Gap, Rep, Generation) %>%
summarise(Furthest = max(Pot))
ggplot(Ler_thresh, aes(x = Generation, y = Furthest, color = as.factor(Rep))) +
geom_line() + facet_grid(Treatment ~ Gap)
Not surprisingly, this works best for for Treatment C.
mean(filter(Ler_furthest, Treatment == "C", Gap == "0p", Generation == 6)$Furthest)
[1] 14
var(filter(Ler_furthest, Treatment == "C", Gap == "0p", Generation == 6)$Furthest)
[1] 14.44444
mean(filter(Ler_thresh, Treatment == "C", Gap == "0p", Generation == 6)$Furthest)
[1] 12.3
var(filter(Ler_thresh, Treatment == "C", Gap == "0p", Generation == 6)$Furthest)
[1] 11.78889
The mean and variance of total distance travelled is slightly less in the threshold case, but that might all be attributed to generation 1 (effectively, there is a one-generation delay between inititial colonization and reaching the threshold, here set to 100 seetlings).
HMMPF Now I’m getting an error from load.project() that seems to derive from the 01-Ler-seed-gen1.R — except that the script works fine when I run it in the console!
4/28: Solved this—it had to do with loading both plyr and dplyr in the initialization file. I’ve taken out plyr, and explicitly used plyr::ddply() in the script that generates popLer.
I’ve been farting around with data for a few weeks now. I think it has been useful, both in terms of getting me back into the data and code, and figuring out how to work in the bookdown/ProjectTemplate world, but I can easily get lost in here indefinitely.
So now it’s time to take stock. I also have to think about the talk I’m going to give in Bren in 2 (!) weeks.
So recall the central question: What are the drivers of variation in spread rate, and are they predictable?
There are two components:
I need to distinguish between the empirical patterns that I need to build into the model vs. those that I want the model to predict. The primary one of the latter is the variance in cumulative spread.
I also need to keep in mind that the goal is not necessarily to generate proximal explanations for all of the sources of variability, but to show how each of the sources of variability contributes to the variability in spread. In general, the types of variability are:
It’s straightforward to show that, if the spread increment is iid, then the theoretical variance in cumulative spread grows linearly with time. Has this been noted in the literature? In finite realizations, there will be autocorrelation. A replicate that is ahead of the pack this year will tend to be ahead of the pack next year; and a collection of replicates that has higher than expected variance this year will tend to have higher than expected variance next year.
Two key questions are then:
Again, the key questions for prediction spread are: How much of the variation in annual spread is “inherently stochastic”? How large is that variance relative to the mean rate of spread? The CV of cumulative spread will grow with the square root of time, but that will be proportional to the CV of one-generation spread.
I want to do the first section on variability in spread, and second section on evolution
For the first section:
For the second section:
~/Dropbox/Arabidopsis/analysls/Science_Scripts&Data.Melbourne & Hastings (2009) has this brief review of theory (p. 1537):
Stochastic models are needed to assess uncer- tainty in the rate of spread because traditional deterministic models provide no information about variance. Analyses have shown that linear sto- chastic models have the same rate of spread as the equivalent deterministic model, whereas sto- chasticity slows spread in nonlinear (density- dependent) models (9, 10). Investigations of the variance of spread rate have largely relied on nu- merical simulations and suggest that variance be- tween realizations can be high, especially when birth rates are low, variation in birth rates between individuals is high, or dispersal includes infrequent long-distance events (11–15). This work sug- gests that great uncertainty in an invasion forecast can be expected because of inherent biological variability.
The rest of the models in the paper are simulation-based. Here are the references cited above:
Going back to Skellam (1951) (also cited by BM) it starts out looking promising as it is based on “random dispersal”—but in fact he is using microscale randomness to get to diffusion, and the description of spread is deterministic.
Lewis (2000) studies an integro-difference framework: continuous space. His measure of spread is the “expectation velocity:”
This can be described as the speed of movement \(c\) of a point \(x_t\) beyond which the expected number of individuals \(n_e\) is fixed. Thus \(x_t\) is defined so that the integral of the expected density over the interval \((x_t,\infty)\) is \(n_e\). (p. 432)
He does note other studies have looked at furthest extent:
[Other definitions of spread rates are given by] The rate of change of the average location of the farthest forward individual with respect to time [6]. McKean [24] showed that, in certain linear stochastic processes, the distribution of the furthest forward individual can be modeled by a nonlinear deterministic reaction-diffusion equation (KPP or Fisher equation), which in turn exhibits traveling wave solutions.
The paper defies a quick read, but appears to be focused on the expected abundance at a location and the covariance in abundances between two locations (although the covariance might be in the location of pairs of ineividuals). I don’t see a straightforward way to connect this to our problem.
24 McKean, H.P.: Application of Brownian motion to the equation of Kolmogorov- Petrovskii-Piscounov, Commun. Pure Appl. Math, 28, 323–331 (1975)
This paper is primarily about showing how deterministic linear models can approximate the mean spread in stochastic and nonlinear models. The stochasticity is again at the micro scale: birth-death with Brownian movement of individuals.
This paper looks at long distance dispersal (LDD). It is mostly about estimating parameters from data, and describing the LDD model (although I think it mostly comes from prior papers by Clark). They say repeatedly that spread in this case is “erratic” and “unpredictable” but don’t quantify it. In three numerical examples (Fig. 2c, 4) they plot medians, 50% and 90% intervals for cumulative distance spread through time. It’s not clear whether these are based on full population models or just draws from the distribution of jump distances. The intervals grow though time, but are smooth only in the case that lacks LDD (Fig. 4b), suggesting that they are not using enough draws to really characterize the distribution. It’s hard to tell how the width of the intervals scales with time.
Possible sources of the LDD model:
Clark et al (1999) introduces the “2Dt” dispersal kernel. Clark et al (2001) shows how LDD affects mean speed, and that variability in reproduction in the furthest extremens model slows spread. They do not address the variance in spread, although it is illustrated in a numerical example (fig. 5).
This paper is a more extensive treatment of hte models in Lewis (2000). Basically, the covariance is of interest because it describes the shape of the wavefront and characterises the extent of patchiness. Variability in spread is not addressed that I can see.
Stochastic models of population spread have been widely studied in the context of the spread of an infection. An introduction to the early work in this area can be found in a review by Mollison [26].
I haven’t read all the papers in BM yet, and I haven’t done any other literature search, but so far I haven’t found any characterizations of the variability in cumulative spread. Mark Lewis has a new (2016) book that will probably have anything that exists in the theory literature - I need to digest that next.
First we look at furthest distance reached.
Ler_furthest_C <- filter(popLer, Treatment == "C") %>%
group_by(Gap, Rep, Generation) %>%
summarise(Furthest = max(Pot))
Now calculate per-generation spread rates
# The "default" argument to lag() replaces the leading NA with the specified value
gen_spread <- group_by(Ler_furthest_C, Gap, Rep) %>%
mutate(speed = Furthest - lag(Furthest, default = 0),
speed_m1 = lag(speed))
Hmm, there is one case (rep 8, 3p gap) where the forward pot seems to go extinct, resulting in a negative speed. Need to check the original data, but for now let’s just set that to zero.
gen_spread <- within(gen_spread, speed[speed < 0] <- 0)
gen_spread <- within(gen_spread, speed_m1[speed_m1 < 0] <- 0)
Look at autocorrelation:
spread_AR <- group_by(gen_spread, Gap, Rep) %>%
summarise(AR1 = cor(speed, speed_m1, use = "complete"))
ggplot(spread_AR, aes(x = AR1)) + geom_histogram(aes(fill = Gap))
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ALthough it looks like a bias towards negative values, I think that’s mostly from the 3p gaps, where the pattern 0-4-0 will be common.
Look at trends by rep:
ggplot(gen_spread, aes(x = Generation, y = speed, color = as.factor(Rep))) +
geom_point(position = "jitter") +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~Gap)
ggplot(gen_spread, aes(x = Generation, y = speed)) +
geom_point(position = "jitter") +
geom_smooth() +
facet_wrap(~Gap)
ggplot(gen_spread, aes(x = Generation, y = speed)) +
geom_point(position = "jitter") +
geom_smooth(method = "gam", method.args = list(k = 4)) +
facet_wrap(~Gap)
summary(lm(speed ~ Generation, data = filter(gen_spread, Gap == "0p")))
Call:
lm(formula = speed ~ Generation, data = filter(gen_spread, Gap ==
"0p"))
Residuals:
Min 1Q Median 3Q Max
-2.4705 -1.3090 0.0295 1.1224 3.5295
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2933 0.4609 7.145 1.67e-09 ***
Generation -0.2743 0.1183 -2.318 0.024 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.566 on 58 degrees of freedom
Multiple R-squared: 0.08476, Adjusted R-squared: 0.06898
F-statistic: 5.371 on 1 and 58 DF, p-value: 0.02402
library(mgcv)
summary(gam(speed ~ s(Generation, k = 4), data = filter(gen_spread, Gap == "0p")))
Family: gaussian
Link function: identity
Formula:
speed ~ s(Generation, k = 4)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3333 0.1828 12.77 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Generation) 2.223 2.597 7.506 0.000784 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.239 Deviance explained = 26.7%
GCV = 2.1184 Scale est. = 2.0046 n = 60
plot(gam(speed ~ s(Generation, k = 4), data = filter(gen_spread, Gap == "0p")))
anova(gam(speed ~ s(Generation, k = 4), data = filter(gen_spread, Gap == "0p")),
gam(speed ~ Generation, data = filter(gen_spread, Gap == "0p")),
test = "Chisq")
Analysis of Deviance Table
Model 1: speed ~ s(Generation, k = 4)
Model 2: speed ~ Generation
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 56.403 113.82
2 58.000 142.17 -1.5975 -28.351 0.0004792 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So there looks to be some nonstationarity in the continuous runways, with the speed peaking in generation 3. The initial increase makes sense, but the decline does not.
Just read the papers from Brett’s and TEX’s labs. As Jen said, their “no evolution” treatment was spatial shuffling, so didn’t maintain the original genetic composition.
They both found increased mean and increased variance with evolution. Both found increased dispersal ability at the edge of the evolving populations; the flour beetle experiment also found a reduction in low-density growth rate.
The effects on dispersal are explained as spatial sorting; the growth rate effects are hypothesized to represent “gene surfing”, whereby deleterious alleles exposed by founder effects at the leading edge are carried forward by the fact that next generation’s leading edge is primarily founded by this generation’s leading edge.
The flour beetle experiment found that the among-replicate variance in cumulative spread grew faster than linearly. This is consistent with among-replicate differences in mean speeds (which would generate a quadratic growth in variance through time). This pattern was found in both treatments, suggesting that it’s not just a consequence of founder effects at the leading edge. Perhaps it represents founder effects in the initial establishment from the stock population (each replicate was founded by 20 individuals). The bean beetle experiment doesn’t show this graph but found strong support for a random effect for replicate speed in their model for cumulative spread.
So a couple of things to do (both for Ler and the evolution experiments):
To facilitate the above analyses, I have modified the script for popLer to make categorical versions of Generation and Replicate (Gen and Rep, respectively), and put the guts of last time’s data manipulation into a munge script, which creates LerC_spread containing cumulative and per-generation spread for the C treatment replicates.
Let’s calculate the means and variances of cumulative spread:
cum_spread_stats <- group_by(LerC_spread, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(Furthest),
var_spread = var(Furthest),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_smooth(method = "lm")
ggplot(aes(x = Generation, y = var_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_line()
ggplot(aes(x = Generation, y = CV_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_line()
Warning: Removed 1 rows containing missing values (geom_point).
Warning: Removed 1 rows containing missing values (geom_path).
The linear approximation to mean spread is pretty good, although we see a dip in generation 6, as expected from the reduced seed production, and even without gen 6, the continuous runways seem to be decelerating. The ranking between landscapes makes sense. The variances are rather more over the map, although we don’t have confidence intervals on them. But exept for 1-pot gaps, we can squint and imagine that the variances are linear in time. Note, however, that with only 6 generations we’re not going to easily detect nonlinearities. I don’t know that the CVs tell us much.
Probably the best way to get CIs is to bootstrap replicates.
Let’s try fitting a model. We’ll need to add a lag of cumulative spread.
LerC_spread <- mutate(LerC_spread,
Furthestm1 = lag(Furthest, default = 0))
m1 <- lm(Furthest ~ Generation + Rep + Generation:Rep + Furthestm1,
data = filter(LerC_spread, Gap == "0p"))
summary(m1)
Call:
lm(formula = Furthest ~ Generation + Rep + Generation:Rep + Furthestm1,
data = filter(LerC_spread, Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.08017 -0.73937 0.03383 0.78409 2.54087
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.26479 1.38823 -0.191 0.849720
Generation 2.22470 0.58884 3.778 0.000528 ***
Rep2 0.42109 1.81056 0.233 0.817308
Rep3 1.70697 1.81500 0.940 0.352762
Rep4 -0.36309 1.82680 -0.199 0.843484
Rep5 1.94739 1.80616 1.078 0.287571
Rep6 0.91400 1.80930 0.505 0.616284
Rep7 0.73685 1.80286 0.409 0.684987
Rep8 -1.47533 1.86173 -0.792 0.432890
Rep9 1.19824 1.80269 0.665 0.510156
Rep10 0.63691 1.82680 0.349 0.729229
Furthestm1 0.02636 0.21151 0.125 0.901446
Generation:Rep2 0.15787 0.47548 0.332 0.741652
Generation:Rep3 -0.81727 0.47167 -1.733 0.091045 .
Generation:Rep4 1.08491 0.51942 2.089 0.043312 *
Generation:Rep5 0.51499 0.51405 1.002 0.322606
Generation:Rep6 -0.58945 0.47054 -1.253 0.217770
Generation:Rep7 -0.25940 0.46323 -0.560 0.578691
Generation:Rep8 1.45031 0.54307 2.671 0.010986 *
Generation:Rep9 -0.20452 0.46429 -0.440 0.662009
Generation:Rep10 0.65634 0.51942 1.264 0.213878
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.369 on 39 degrees of freedom
Multiple R-squared: 0.9503, Adjusted R-squared: 0.9248
F-statistic: 37.25 on 20 and 39 DF, p-value: < 2.2e-16
car::Anova(m1)
Anova Table (Type II tests)
Response: Furthest
Sum Sq Df F value Pr(>F)
Generation 5.178 1 2.7619 0.10455
Rep 32.986 9 1.9550 0.07214 .
Furthestm1 0.029 1 0.0155 0.90145
Generation:Rep 35.750 9 2.1188 0.05125 .
Residuals 73.114 39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is a little curious– I wasn’t expecting the reps to have different speeds, except as a side effect of autocorrelation. Let’s try a few other models:
m2 <- lm(Furthest ~ Generation + Rep + Generation:Rep,
data = filter(LerC_spread, Gap == "0p"))
m3 <- lm(Furthest ~ Generation + Furthestm1, data = filter(LerC_spread, Gap == "0p"))
anova(m1, m2, m3)
Analysis of Variance Table
Model 1: Furthest ~ Generation + Rep + Generation:Rep + Furthestm1
Model 2: Furthest ~ Generation + Rep + Generation:Rep
Model 3: Furthest ~ Generation + Furthestm1
Res.Df RSS Df Sum of Sq F Pr(>F)
1 39 73.114
2 40 73.143 -1 -0.029 0.0155 0.90145
3 57 141.849 -17 -68.707 2.1558 0.02383 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Hmm, they really do seem to have different slopes. But we have not yet accounted for time-varying speeds. I don’t see how to do that in this context; let’s look at per-generation speeds instead:
m11 <- lm(speed ~ Gen + Rep, data = filter(LerC_spread, Gap == "0p"))
summary(m11)
Call:
lm(formula = speed ~ Gen + Rep, data = filter(LerC_spread, Gap ==
"0p"))
Residuals:
Min 1Q Median 3Q Max
-3.0333 -0.7083 -0.0500 0.8667 2.5333
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.733e+00 6.973e-01 2.486 0.0167 *
Gen2 1.400e+00 6.237e-01 2.245 0.0298 *
Gen3 1.300e+00 6.237e-01 2.084 0.0428 *
Gen4 7.000e-01 6.237e-01 1.122 0.2677
Gen5 4.000e-01 6.237e-01 0.641 0.5246
Gen6 -1.200e+00 6.237e-01 -1.924 0.0607 .
Rep2 7.525e-16 8.052e-01 0.000 1.0000
Rep3 -6.667e-01 8.052e-01 -0.828 0.4121
Rep4 1.000e+00 8.052e-01 1.242 0.2207
Rep5 6.667e-01 8.052e-01 0.828 0.4121
Rep6 -5.000e-01 8.052e-01 -0.621 0.5378
Rep7 -3.333e-01 8.052e-01 -0.414 0.6809
Rep8 1.167e+00 8.052e-01 1.449 0.1543
Rep9 -1.667e-01 8.052e-01 -0.207 0.8370
Rep10 5.000e-01 8.052e-01 0.621 0.5378
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.395 on 45 degrees of freedom
Multiple R-squared: 0.4365, Adjusted R-squared: 0.2612
F-statistic: 2.49 on 14 and 45 DF, p-value: 0.01046
car::Anova(m11)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 46.133 5 4.7433 0.001451 **
Rep 21.667 9 1.2376 0.297021
Residuals 87.533 45
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Well this is confusing! If the per-generation speeds don’t differ among reps, how can the slopes of cumulative spread do so? Let’s look at some plots:
ggplot(aes(x = Generation, y = Furthest, color = Rep),
data = filter(LerC_spread, Gap == "0p")) +
geom_point() + geom_smooth(method = "lm")
ggplot(aes(x = Generation, y = speed, color = Rep),
data = filter(LerC_spread, Gap == "0p")) +
geom_line()
Let’s look for autocorrelation in speed.
m12 <- lm(speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread, Gap == "0p"))
summary(m12)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.9931 -0.5869 -0.0863 0.7257 2.5069
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6162 0.7801 4.635 4.8e-05 ***
Gen3 0.3577 0.6601 0.542 0.591277
Gen4 -0.2750 0.6549 -0.420 0.677148
Gen5 -0.7711 0.6313 -1.222 0.230033
Gen6 -2.4692 0.6246 -3.953 0.000357 ***
Rep2 0.2654 0.8793 0.302 0.764579
Rep3 -0.7962 0.8839 -0.901 0.373882
Rep4 1.5962 0.8839 1.806 0.079554 .
Rep5 0.7270 0.8930 0.814 0.421127
Rep6 -0.7308 0.8810 -0.829 0.412469
Rep7 -0.2654 0.8793 -0.302 0.764579
Rep8 1.7923 0.8992 1.993 0.054082 .
Rep9 -0.4000 0.8787 -0.455 0.651778
Rep10 0.8616 0.8879 0.970 0.338540
speed_m1 -0.3270 0.1591 -2.055 0.047378 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.389 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.5314, Adjusted R-squared: 0.3439
F-statistic: 2.835 on 14 and 35 DF, p-value: 0.006219
car::Anova(m12)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 47.871 4 6.1995 0.0007015 ***
Rep 32.146 9 1.8502 0.0935244 .
speed_m1 8.154 1 4.2240 0.0473783 *
Residuals 67.566 35
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Aha! there is negative autocorrelation in speed (evidently it was obscured by the time-dependent patterns in the univariate analysis I did last week). A potential explanation for this could be partial “pushing” by older pots, so that a replicate that gets extra far this generation has less pushing next generation. Alternatively, it could reflect a correlation between distance spread and the number of plants in the furthest pot.
Let’s do one more analysis with a quadratic of time, so we can interact it with Rep:
m13 <- lm(speed ~ poly(Generation, 2) * Rep + speed_m1,
data = filter(LerC_spread, Gap == "0p"))
summary(m13)
Call:
lm(formula = speed ~ poly(Generation, 2) * Rep + speed_m1, data = filter(LerC_spread,
Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.14668 -0.47051 0.02562 0.52439 2.46495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.054e+00 9.512e-01 3.211 0.0046 **
poly(Generation, 2)1 1.990e+00 7.837e+00 0.254 0.8023
poly(Generation, 2)2 -2.395e+00 7.431e+00 -0.322 0.7507
Rep2 4.945e-01 1.154e+00 0.429 0.6730
Rep3 1.295e-01 1.159e+00 0.112 0.9122
Rep4 2.280e+00 1.154e+00 1.976 0.0629 .
Rep5 6.689e-01 1.166e+00 0.574 0.5730
Rep6 -6.265e-01 1.150e+00 -0.545 0.5922
Rep7 1.540e-02 1.160e+00 0.013 0.9895
Rep8 1.467e+00 1.155e+00 1.270 0.2194
Rep9 -9.598e-01 1.150e+00 -0.835 0.4142
Rep10 8.141e-01 1.148e+00 0.709 0.4869
speed_m1 -4.215e-01 1.982e-01 -2.126 0.0468 *
poly(Generation, 2)1:Rep2 -1.014e+01 1.115e+01 -0.910 0.3745
poly(Generation, 2)2:Rep2 -5.019e+00 1.087e+01 -0.462 0.6495
poly(Generation, 2)1:Rep3 -1.910e+01 1.109e+01 -1.722 0.1012
poly(Generation, 2)2:Rep3 7.551e+00 1.057e+01 0.714 0.4838
poly(Generation, 2)1:Rep4 -1.356e+01 1.122e+01 -1.208 0.2417
poly(Generation, 2)2:Rep4 3.330e+00 1.155e+01 0.288 0.7762
poly(Generation, 2)1:Rep5 -3.324e+00 1.110e+01 -0.299 0.7678
poly(Generation, 2)2:Rep5 -8.064e+00 1.046e+01 -0.771 0.4502
poly(Generation, 2)1:Rep6 -3.761e+00 1.109e+01 -0.339 0.7382
poly(Generation, 2)2:Rep6 5.375e-14 1.044e+01 0.000 1.0000
poly(Generation, 2)1:Rep7 -1.072e+01 1.111e+01 -0.965 0.3467
poly(Generation, 2)2:Rep7 -3.057e+00 1.080e+01 -0.283 0.7801
poly(Generation, 2)1:Rep8 5.176e+00 1.121e+01 0.462 0.6496
poly(Generation, 2)2:Rep8 -7.049e+00 1.053e+01 -0.669 0.5114
poly(Generation, 2)1:Rep9 2.805e+00 1.108e+01 0.253 0.8028
poly(Generation, 2)2:Rep9 -1.315e+01 1.057e+01 -1.244 0.2287
poly(Generation, 2)1:Rep10 -6.363e+00 1.115e+01 -0.571 0.5750
poly(Generation, 2)2:Rep10 -1.054e+01 1.087e+01 -0.970 0.3444
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.43 on 19 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.7304, Adjusted R-squared: 0.3048
F-statistic: 1.716 on 30 and 19 DF, p-value: 0.1102
car::Anova(m13)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 46.959 2 11.4785 0.0005389 ***
Rep 32.487 9 1.7646 0.1422649
speed_m1 9.249 1 4.5217 0.0467938 *
poly(Generation, 2):Rep 29.613 18 0.8043 0.6761230
Residuals 38.865 19
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Drop the interaction:
m14 <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(LerC_spread, Gap == "0p"))
summary(m14)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(LerC_spread,
Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.89320 -0.68615 -0.06304 0.68486 2.49198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9649 0.7018 4.225 0.00015 ***
poly(Generation, 2)1 -4.2553 2.4272 -1.753 0.08786 .
poly(Generation, 2)2 -5.5072 2.3684 -2.325 0.02565 *
Rep2 0.2662 0.8609 0.309 0.75893
Rep3 -0.7985 0.8653 -0.923 0.36206
Rep4 1.5985 0.8653 1.847 0.07269 .
Rep5 0.7309 0.8738 0.836 0.40831
Rep6 -0.7323 0.8626 -0.849 0.40133
Rep7 -0.2662 0.8609 -0.309 0.75893
Rep8 1.7970 0.8797 2.043 0.04824 *
Rep9 -0.4000 0.8604 -0.465 0.64473
Rep10 0.8647 0.8690 0.995 0.32619
speed_m1 -0.3309 0.1526 -2.169 0.03660 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.36 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.5251, Adjusted R-squared: 0.371
F-statistic: 3.409 on 12 and 37 DF, p-value: 0.002013
car::Anova(m14)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 46.959 2 12.6865 6.372e-05 ***
Rep 32.487 9 1.9504 0.07456 .
speed_m1 8.705 1 4.7036 0.03660 *
Residuals 68.478 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So, in conclusion: weak evidence for among-Rep differences in means, but strong evidence for negative autocorrelation. Note that this autocorrelation should act to slow the rate at which the variance in cumulative spread increases with time.
Let’s check out the other landscapes:
m24 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "1p"))
m34 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "2p"))
m44 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "3p"))
summary(m24)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "1p"))
Residuals:
Min 1Q Median 3Q Max
-1.9393 -1.0203 -0.2807 0.7435 3.0813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.939e+00 8.046e-01 2.410 0.0213 *
Gen3 -6.966e-01 6.728e-01 -1.035 0.3076
Gen4 -6.345e-01 6.671e-01 -0.951 0.3481
Gen5 -2.000e-01 6.664e-01 -0.300 0.7658
Gen6 -1.131e+00 6.692e-01 -1.690 0.0999 .
Rep2 1.379e-01 9.504e-01 0.145 0.8855
Rep3 -1.100e-15 9.424e-01 0.000 1.0000
Rep4 -1.008e-15 9.424e-01 0.000 1.0000
Rep5 9.379e-01 9.504e-01 0.987 0.3305
Rep6 4.000e-01 9.424e-01 0.424 0.6738
Rep7 -3.310e-01 9.444e-01 -0.351 0.7280
Rep8 4.690e-01 9.444e-01 0.497 0.6226
Rep9 4.000e-01 9.424e-01 0.424 0.6738
Rep10 1.007e+00 9.604e-01 1.048 0.3017
speed_m1 -1.724e-01 1.543e-01 -1.117 0.2715
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.49 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1932, Adjusted R-squared: -0.1295
F-statistic: 0.5988 on 14 and 35 DF, p-value: 0.8474
summary(m34)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "2p"))
Residuals:
Min 1Q Median 3Q Max
-2.9139 -0.8927 -0.3292 0.9515 3.3861
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.206e+00 8.573e-01 2.573 0.0145 *
Gen3 -3.000e-01 7.061e-01 -0.425 0.6736
Gen4 -5.386e-02 7.076e-01 -0.076 0.9398
Gen5 3.010e-16 7.061e-01 0.000 1.0000
Gen6 -6.000e-01 7.061e-01 -0.850 0.4013
Rep2 1.077e-01 1.003e+00 0.107 0.9151
Rep3 -1.200e+00 9.986e-01 -1.202 0.2376
Rep4 -1.200e+00 9.986e-01 -1.202 0.2376
Rep5 -1.308e+00 1.003e+00 -1.304 0.2007
Rep6 -1.308e+00 1.003e+00 -1.304 0.2007
Rep7 7.077e-01 1.003e+00 0.706 0.4850
Rep8 -1.200e+00 9.986e-01 -1.202 0.2376
Rep9 -2.015e+00 1.015e+00 -1.985 0.0550 .
Rep10 -4.923e-01 1.003e+00 -0.491 0.6266
speed_m1 -1.795e-01 1.524e-01 -1.178 0.2469
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.579 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2666, Adjusted R-squared: -0.02679
F-statistic: 0.9087 on 14 and 35 DF, p-value: 0.558
summary(m44)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "3p"))
Residuals:
Min 1Q Median 3Q Max
-1.8047 -0.8358 -0.5528 0.2732 3.5512
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.150e-03 8.867e-01 -0.004 0.997
Gen3 -2.961e-01 7.572e-01 -0.391 0.698
Gen4 8.520e-01 7.464e-01 1.141 0.261
Gen5 -1.921e-01 7.991e-01 -0.240 0.811
Gen6 -3.480e-01 7.464e-01 -0.466 0.644
Rep2 9.039e-01 1.061e+00 0.852 0.400
Rep3 9.039e-01 1.061e+00 0.852 0.400
Rep4 1.808e+00 1.091e+00 1.657 0.106
Rep5 9.039e-01 1.061e+00 0.852 0.400
Rep6 9.039e-01 1.061e+00 0.852 0.400
Rep7 8.000e-01 1.050e+00 0.762 0.451
Rep8 9.039e-01 1.061e+00 0.852 0.400
Rep9 9.039e-01 1.061e+00 0.852 0.400
Rep10 -7.436e-17 1.050e+00 0.000 1.000
speed_m1 -1.299e-01 1.842e-01 -0.705 0.485
Residual standard error: 1.661 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1824, Adjusted R-squared: -0.1447
F-statistic: 0.5575 on 14 and 35 DF, p-value: 0.879
car::Anova(m24)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 7.794 4 0.8776 0.4873
Rep 7.984 9 0.3996 0.9268
speed_m1 2.772 1 1.2485 0.2715
Residuals 77.708 35
car::Anova(m34)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 2.708 4 0.2715 0.8943
Rep 28.733 9 1.2805 0.2819
speed_m1 3.458 1 1.3869 0.2469
Residuals 87.262 35
car::Anova(m44)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 9.585 4 0.8687 0.4924
Rep 10.624 9 0.4279 0.9109
speed_m1 1.372 1 0.4974 0.4853
Residuals 96.548 35
Nothing to see here, folks!
The pattern of autocorrelation should, I think, fall into the category of something that we want the model to reproduce.
We’ll repeat the analysis with the newly loaded RIL data. Let’s calculate the means and variances of cumulative spread:
cum_spread_stats <- group_by(RIL_spread, Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(Furthest),
var_spread = var(Furthest),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 1 rows containing missing values (geom_point).
The patterns in the mean are probably not overly distinguishable from linear, although I’d want to see CIs.
Let’s look at per-generation spread.
speed_stats <- group_by(RIL_spread, Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(speed),
var_spread = var(speed),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 6 rows containing missing values (geom_point).
Warning: Removed 1 rows containing missing values (geom_path).
In 0p and 1p, the treatments look very similar. But I bet there will be differences in the autocorrelation structure. Let’s fit some models.
m1_0NO <- lm(speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread, Gap == "0p", Treatment == "NO"))
summary(m1_0NO)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.3398 -0.9501 0.0388 0.7904 3.5296
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3567 0.7471 3.154 0.0029 **
Gen3 -0.2490 0.6206 -0.401 0.6902
Gen4 -0.4796 0.6155 -0.779 0.4401
Gen5 0.1000 0.6146 0.163 0.8715
Gen6 1.0612 0.6232 1.703 0.0957 .
Gen7 -0.2470 0.6669 -0.370 0.7129
Rep2 -0.2007 0.7955 -0.252 0.8020
Rep3 0.2347 0.8017 0.293 0.7711
Rep4 0.6667 0.7934 0.840 0.4053
Rep5 1.4013 0.8017 1.748 0.0874 .
Rep6 -0.4660 0.7955 -0.586 0.5610
Rep7 0.2007 0.7955 0.252 0.8020
Rep8 0.7177 0.7981 0.899 0.3734
Rep9 0.9353 0.8119 1.152 0.2556
Rep10 1.0850 0.8063 1.346 0.1853
speed_m1 -0.1020 0.1726 -0.591 0.5575
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.374 on 44 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2873, Adjusted R-squared: 0.04432
F-statistic: 1.182 on 15 and 44 DF, p-value: 0.3201
car::Anova(m1_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 14.832 5 1.5709 0.1882
Rep 18.390 9 1.0820 0.3949
speed_m1 0.660 1 0.3493 0.5575
Residuals 83.090 44
m1_0YES <- lm(speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread, Gap == "0p", Treatment == "YES"))
summary(m1_0YES)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-4.4242 -0.9333 -0.0096 0.7844 9.4123
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1877 1.2650 2.520 0.0154 *
Gen3 0.8602 1.0866 0.792 0.4328
Gen4 0.3393 1.0909 0.311 0.7572
Gen5 3.4000 1.0862 3.130 0.0031 **
Gen6 0.4369 1.1807 0.370 0.7131
Gen7 -0.5796 1.0877 -0.533 0.5968
Rep2 -1.2330 1.4031 -0.879 0.3843
Rep3 -0.7330 1.4031 -0.522 0.6040
Rep4 -0.9329 1.4041 -0.664 0.5099
Rep5 -0.6667 1.4023 -0.475 0.6368
Rep6 -0.1667 1.4023 -0.119 0.9059
Rep7 -0.5663 1.4031 -0.404 0.6884
Rep8 -1.1998 1.4025 -0.856 0.3969
Rep9 -0.4668 1.4025 -0.333 0.7408
Rep10 -0.3333 1.4023 -0.238 0.8132
speed_m1 -0.1990 0.1446 -1.376 0.1757
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.429 on 44 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.32, Adjusted R-squared: 0.08825
F-statistic: 1.381 on 15 and 44 DF, p-value: 0.1992
car::Anova(m1_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 96.161 5 3.2602 0.01369 *
Rep 9.104 9 0.1715 0.99602
speed_m1 11.173 1 1.8940 0.17572
Residuals 259.560 44
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nothing here, really, except for a generation effect in the evolution treatment, which is in turn driven by the high mean in generation 5. We don’t have enough df to look at an interaction, let’s look at the distribution of values to see if it’s a single outlier:
filter(RIL_spread, Treatment == "YES", Gap == "0p", Generation == 5)$speed
[1] 16 3 7 6 5 7 1 4 3 3
It’s rep 1 with a crazy jump. Could that be a data error? Let’s look at the time series:
filter(RIL_spread, Treatment == "YES", Gap == "0p", Rep == "1")$speed
[1] 2 -1 0 0 16 1 4
popRIL %>% filter(Treatment == "YES", Gap == "0p", Rep == "1") %>%
select(Generation, Pot, Seedlings) %>%
reshape(timevar = "Generation", direction = "wide", idvar = "Pot")
Pot Seedlings.1 Seedlings.2 Seedlings.3 Seedlings.4 Seedlings.5
1 0 262 503 602 785 506
2 1 47 330 484 454 172
3 2 5 NA NA NA 298
13 3 NA NA NA NA 236
14 4 NA NA NA NA 176
15 5 NA NA NA NA 519
16 6 NA NA NA NA 235
17 7 NA NA NA NA 340
18 8 NA NA NA NA 322
19 9 NA NA NA NA 346
20 10 NA NA NA NA 346
21 11 NA NA NA NA 340
22 12 NA NA NA NA 345
23 13 NA NA NA NA 208
24 14 NA NA NA NA 117
25 15 NA NA NA NA 1
26 17 NA NA NA NA 1
43 16 NA NA NA NA NA
45 18 NA NA NA NA NA
65 19 NA NA NA NA NA
66 20 NA NA NA NA NA
67 22 NA NA NA NA NA
Seedlings.6 Seedlings.7
1 880 271
2 601 166
3 845 249
13 992 332
14 628 225
15 788 388
16 526 571
17 778 285
18 630 563
19 490 384
20 733 268
21 632 312
22 602 368
23 281 133
24 826 194
25 152 266
26 96 355
43 644 136
45 4 251
65 NA 89
66 NA 15
67 NA 1
There is clearly missing data in generations 2-4! This also appears to have been fixed in the Science paper, as there is no replicate sitting at pot 2 for several generations.
I’ve emailed Jenn, but for now let’s look at it without Rep1
Let’s calculate the means and variances of cumulative spread:
cum_spread_stats <- filter(RIL_spread, !(Rep == "1" & Treatment == "YES" & Gap == "0p")) %>%
group_by(Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(Furthest),
var_spread = var(Furthest),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 1 rows containing missing values (geom_point).
The patterns in the mean are probably not overly distinguishable from linear, although I’d want to see CIs.
Let’s look at per-generation spread.
speed_stats <- filter(RIL_spread, !(Rep == "1" & Treatment == "YES" & Gap == "0p")) %>%
group_by(Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(speed),
var_spread = var(speed),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 6 rows containing missing values (geom_point).
Warning: Removed 1 rows containing missing values (geom_path).
In 0p and 1p, the treatments look very similar. But I bet there will be differences in the autocorrelation structure. Let’s fit some models.
m1_0NO <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "NO"))
summary(m1_0NO)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.3398 -0.9501 0.0388 0.7904 3.5296
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3567 0.7471 3.154 0.0029 **
Gen3 -0.2490 0.6206 -0.401 0.6902
Gen4 -0.4796 0.6155 -0.779 0.4401
Gen5 0.1000 0.6146 0.163 0.8715
Gen6 1.0612 0.6232 1.703 0.0957 .
Gen7 -0.2470 0.6669 -0.370 0.7129
Rep2 -0.2007 0.7955 -0.252 0.8020
Rep3 0.2347 0.8017 0.293 0.7711
Rep4 0.6667 0.7934 0.840 0.4053
Rep5 1.4013 0.8017 1.748 0.0874 .
Rep6 -0.4660 0.7955 -0.586 0.5610
Rep7 0.2007 0.7955 0.252 0.8020
Rep8 0.7177 0.7981 0.899 0.3734
Rep9 0.9353 0.8119 1.152 0.2556
Rep10 1.0850 0.8063 1.346 0.1853
speed_m1 -0.1020 0.1726 -0.591 0.5575
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.374 on 44 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2873, Adjusted R-squared: 0.04432
F-statistic: 1.182 on 15 and 44 DF, p-value: 0.3201
car::Anova(m1_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 14.832 5 1.5709 0.1882
Rep 18.390 9 1.0820 0.3949
speed_m1 0.660 1 0.3493 0.5575
Residuals 83.090 44
m1_0YES <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "YES", Rep != "1"))
summary(m1_0YES)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES", Rep != "1"))
Residuals:
Min 1Q Median 3Q Max
-3.3389 -0.9048 -0.2336 0.9944 3.2949
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.51572 0.90182 2.790 0.00812 **
Gen3 0.91819 0.78315 1.172 0.24814
Gen4 0.37478 0.79621 0.471 0.64048
Gen5 1.94748 0.78364 2.485 0.01735 *
Gen6 0.08289 0.83462 0.099 0.92140
Gen7 -1.19900 0.78447 -1.528 0.13448
Rep3 0.50000 0.95896 0.521 0.60504
Rep4 0.28939 0.95926 0.302 0.76450
Rep5 0.58789 0.96017 0.612 0.54391
Rep6 1.08789 0.96017 1.133 0.26412
Rep7 0.66667 0.95896 0.695 0.49105
Rep8 0.04394 0.95926 0.046 0.96369
Rep9 0.79850 0.96168 0.830 0.41141
Rep10 0.92122 0.96017 0.959 0.34325
speed_m1 -0.26367 0.14451 -1.825 0.07573 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.661 on 39 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.3745, Adjusted R-squared: 0.1499
F-statistic: 1.668 on 14 and 39 DF, p-value: 0.1038
car::Anova(m1_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 49.133 5 3.5619 0.009509 **
Rep 6.751 8 0.3059 0.959291
speed_m1 9.184 1 3.3291 0.075731 .
Residuals 107.593 39
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There’s still an effect of Gen 5, but now it looks like just the peak of a quadratic pattern:
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.8175 -0.9077 -0.0137 0.7040 3.0469
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.4458 0.7068 3.460 0.00116 **
poly(Generation, 2)1 1.8290 2.2192 0.824 0.41402
poly(Generation, 2)2 -0.1183 2.2043 -0.054 0.95744
Rep2 -0.2185 0.8279 -0.264 0.79295
Rep3 0.2704 0.8340 0.324 0.74719
Rep4 0.6667 0.8258 0.807 0.42359
Rep5 1.4371 0.8340 1.723 0.09143 .
Rep6 -0.4481 0.8279 -0.541 0.59087
Rep7 0.2185 0.8279 0.264 0.79295
Rep8 0.7445 0.8304 0.896 0.37456
Rep9 0.9890 0.8440 1.172 0.24723
Rep10 1.1297 0.8385 1.347 0.18437
speed_m1 -0.1556 0.1743 -0.893 0.37649
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.43 on 47 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1751, Adjusted R-squared: -0.03547
F-statistic: 0.8316 on 12 and 47 DF, p-value: 0.6182
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 1.757 2 0.4294 0.6534
Rep 19.278 9 1.0469 0.4185
speed_m1 1.631 1 0.7972 0.3765
Residuals 96.165 47
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "YES", Rep != "1"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES", Rep != "1"))
Residuals:
Min 1Q Median 3Q Max
-2.7428 -0.9827 -0.2568 0.8754 3.4194
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.71330 0.78203 3.470 0.00122 **
poly(Generation, 2)1 1.68876 2.66425 0.634 0.52961
poly(Generation, 2)2 -8.22899 2.61909 -3.142 0.00307 **
Rep3 0.50000 0.98302 0.509 0.61367
Rep4 0.28026 0.98328 0.285 0.77703
Rep5 0.60615 0.98407 0.616 0.54124
Rep6 1.10615 0.98407 1.124 0.26737
Rep7 0.66667 0.98302 0.678 0.50137
Rep8 0.05308 0.98328 0.054 0.95721
Rep9 0.82590 0.98538 0.838 0.40669
Rep10 0.93949 0.98407 0.955 0.34519
speed_m1 -0.31846 0.13641 -2.335 0.02442 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.703 on 42 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.2921, Adjusted R-squared: 0.1067
F-statistic: 1.576 on 11 and 42 DF, p-value: 0.1417
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 34.970 2 6.0315 0.004981 **
Rep 7.029 8 0.3031 0.960710
speed_m1 15.799 1 5.4499 0.024425 *
Residuals 121.756 42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In the evolution treatment, strong quadratic effect of generation and negative autocorrelation. Nothing in the no evolution treatment!
Let’s look at the rest of the landscapes.
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "1p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "1p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.6149 -0.7524 0.1080 0.6191 3.1936
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.3527 0.5187 2.608 0.01218 *
poly(Generation, 2)1 5.7573 1.8493 3.113 0.00315 **
poly(Generation, 2)2 -4.4217 1.8359 -2.408 0.02000 *
Rep2 0.2124 0.6976 0.304 0.76215
Rep3 0.9852 0.7056 1.396 0.16919
Rep4 -0.2271 0.6928 -0.328 0.74447
Rep5 -0.2271 0.6928 -0.328 0.74447
Rep6 -0.3333 0.6912 -0.482 0.63186
Rep7 0.6519 0.7056 0.924 0.36027
Rep8 0.6519 0.7056 0.924 0.36027
Rep9 0.5457 0.6976 0.782 0.43801
Rep10 -1.2124 0.6976 -1.738 0.08879 .
speed_m1 -0.3186 0.1419 -2.246 0.02947 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.197 on 47 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.3434, Adjusted R-squared: 0.1758
F-statistic: 2.049 on 12 and 47 DF, p-value: 0.04033
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 15.094 2 5.2656 0.008641 **
Rep 18.723 9 1.4514 0.194196
speed_m1 7.228 1 5.0431 0.029466 *
Residuals 67.363 47
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "1p", Treatment == "YES", Rep != "1"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "1p", Treatment == "YES", Rep != "1"))
Residuals:
Min 1Q Median 3Q Max
-2.4322 -0.8911 -0.2425 0.8770 3.2000
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.391e+00 6.566e-01 3.642 0.000737 ***
poly(Generation, 2)1 -2.207e+00 2.251e+00 -0.980 0.332469
poly(Generation, 2)2 -4.364e-01 2.199e+00 -0.198 0.843638
Rep3 -8.006e-01 8.521e-01 -0.940 0.352776
Rep4 -1.134e+00 8.521e-01 -1.331 0.190420
Rep5 -1.467e+00 8.521e-01 -1.722 0.092418 .
Rep6 -1.694e-15 8.464e-01 0.000 1.000000
Rep7 -1.994e-01 8.521e-01 -0.234 0.816132
Rep8 5.343e-01 8.591e-01 0.622 0.537352
Rep9 -7.336e-01 8.478e-01 -0.865 0.391776
Rep10 -1.067e+00 8.478e-01 -1.258 0.215167
speed_m1 -2.009e-01 1.469e-01 -1.368 0.178626
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.466 on 42 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.2046, Adjusted R-squared: -0.003779
F-statistic: 0.9819 on 11 and 42 DF, p-value: 0.4775
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 3.847 2 0.8950 0.4163
Rep 18.402 8 1.0703 0.4019
speed_m1 4.021 1 1.8711 0.1786
Residuals 90.269 42
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "2p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "2p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.5500 -1.0534 -0.1734 0.5902 4.3641
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.251e-01 6.895e-01 1.342 0.1861
poly(Generation, 2)1 -4.270e+00 2.527e+00 -1.690 0.0977 .
poly(Generation, 2)2 6.366e+00 2.470e+00 2.577 0.0132 *
Rep2 -6.145e-01 9.541e-01 -0.644 0.5226
Rep3 1.844e+00 9.791e-01 1.883 0.0659 .
Rep4 1.229e+00 9.635e-01 1.276 0.2084
Rep5 -1.145e-01 9.541e-01 -0.120 0.9050
Rep6 7.448e-16 9.509e-01 0.000 1.0000
Rep7 6.145e-01 9.541e-01 0.644 0.5226
Rep8 6.145e-01 9.541e-01 0.644 0.5226
Rep9 1.115e+00 9.541e-01 1.168 0.2486
Rep10 3.855e-01 9.541e-01 0.404 0.6880
speed_m1 -2.290e-01 1.555e-01 -1.473 0.1474
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.647 on 47 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2538, Adjusted R-squared: 0.06328
F-statistic: 1.332 on 12 and 47 DF, p-value: 0.2331
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 18.231 2 3.3605 0.04324 *
Rep 24.883 9 1.0192 0.43883
speed_m1 5.885 1 2.1694 0.14745
Residuals 127.489 47
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "2p", Treatment == "YES", Rep != "1"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "2p", Treatment == "YES", Rep != "1"))
Residuals:
Min 1Q Median 3Q Max
-2.5667 -1.4896 -0.9113 1.4641 4.6388
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1786 0.9285 2.346 0.0238 *
poly(Generation, 2)1 -1.1631 3.4858 -0.334 0.7403
poly(Generation, 2)2 -3.2860 3.6111 -0.910 0.3680
Rep3 -0.2584 1.2902 -0.200 0.8422
Rep4 -1.1208 1.2815 -0.875 0.3868
Rep5 -0.3792 1.2815 -0.296 0.7687
Rep6 -0.3792 1.2815 -0.296 0.7687
Rep7 -1.1208 1.2815 -0.875 0.3868
Rep8 0.2416 1.2902 0.187 0.8524
Rep9 -1.0000 1.2786 -0.782 0.4385
Rep10 -0.3792 1.2815 -0.296 0.7687
speed_m1 -0.2416 0.1731 -1.396 0.1701
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.215 on 42 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.1019, Adjusted R-squared: -0.1334
F-statistic: 0.4331 on 11 and 42 DF, p-value: 0.9321
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 9.878 2 1.0071 0.3739
Rep 10.909 8 0.2781 0.9696
speed_m1 9.553 1 1.9479 0.1701
Residuals 205.971 42
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "3p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "3p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-4.4865 -0.7241 -0.0991 0.2443 3.1763
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.595e+00 6.701e-01 2.381 0.0214 *
poly(Generation, 2)1 2.321e-01 2.426e+00 0.096 0.9242
poly(Generation, 2)2 2.721e+00 2.405e+00 1.131 0.2636
Rep2 -6.928e-16 9.135e-01 0.000 1.0000
Rep3 -1.524e+00 9.207e-01 -1.655 0.1045
Rep4 -1.524e+00 9.207e-01 -1.655 0.1045
Rep5 -8.573e-01 9.207e-01 -0.931 0.3565
Rep6 -4.760e-01 9.207e-01 -0.517 0.6076
Rep7 -1.524e+00 9.207e-01 -1.655 0.1045
Rep8 -6.667e-01 9.135e-01 -0.730 0.4692
Rep9 -6.667e-01 9.135e-01 -0.730 0.4692
Rep10 -1.333e+00 9.135e-01 -1.460 0.1511
speed_m1 -2.860e-01 1.715e-01 -1.667 0.1021
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.582 on 47 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1767, Adjusted R-squared: -0.03345
F-statistic: 0.8409 on 12 and 47 DF, p-value: 0.6093
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 5.157 2 1.0299 0.3649
Rep 18.543 9 0.8229 0.5982
speed_m1 6.961 1 2.7805 0.1021
Residuals 117.671 47
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "3p", Treatment == "YES", Rep != "1"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "3p", Treatment == "YES", Rep != "1"))
Residuals:
Min 1Q Median 3Q Max
-3.3934 -1.1567 -0.0894 0.8961 6.0397
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.543e+00 8.663e-01 1.781 0.082173 .
poly(Generation, 2)1 4.948e+00 3.323e+00 1.489 0.143922
poly(Generation, 2)2 -1.057e+01 3.209e+00 -3.293 0.002019 **
Rep3 2.596e-15 1.189e+00 0.000 1.000000
Rep4 1.009e+00 1.193e+00 0.846 0.402187
Rep5 1.009e+00 1.193e+00 0.846 0.402187
Rep6 -1.009e+00 1.193e+00 -0.846 0.402187
Rep7 1.009e+00 1.193e+00 0.846 0.402187
Rep8 1.832e-15 1.189e+00 0.000 1.000000
Rep9 -1.009e+00 1.193e+00 -0.846 0.402187
Rep10 -1.009e+00 1.193e+00 -0.846 0.402187
speed_m1 -5.141e-01 1.322e-01 -3.888 0.000353 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.06 on 42 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.381, Adjusted R-squared: 0.2189
F-statistic: 2.35 on 11 and 42 DF, p-value: 0.02312
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 47.607 2 5.6079 0.0069405 **
Rep 34.413 8 1.0134 0.4408425
speed_m1 64.175 1 15.1191 0.0003532 ***
Residuals 178.275 42
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It’s a very mixed bag here. In summary, polynomial time and negative autocorrelation are found in:
I’m not sure how to interpret this…. Maybe it’s random artifacts???
I’ve had some email exchanges with Jenn about data—basically, I was mistaken in using the Excel files from the data directory. In particular, she said about the RIL data (on May 10):
The data that we used to make figure 2 came from this file (in the analysis folder):
seedlingsALL<-read.csv('2016_03_04_seedlingsALL_corrected.csv', header=T)
This data file does not include the Ler data. She said, in addition:
For the Exp 1 data, use the script
00_ArabiDD_ImportTransformData.R(in my github, I hope!) to read in the corrected files.
It is not in the Dropbox directory; rather it is at https://github.com/jennwilliamsubc/ArabiDD_analyses/blob/master/00_ArabiDD_ImportTransformData.R.
The file is displayed below; it does a lot more than import seedling data! It seems to pull in all the other ancillary data for Ler, so will be a good guide for that.
One thing I will need to ask about is the ancillary data for RILs.
The Arabidopsis work was to have been the centerpiece of my presentation to the faculty & PhD students at Bren yesterday, but as it turned out I didn’t even get to that section. In the meeting with the students a few questions came up but they were generally clarifying questions—nothing about the actual results!
Here it is, as found on github with a date of Aug 24, 2016:
##June 21, 2016
#Use this source file for all of the scripts for the Arabidopsis Density Project
#It doesn't need to be in MarkDown.
#Idea is to pull all data to here, and then to source these Data when you run the other files.
setwd("~/Dropbox/Arabidopsis/analysis")
#libraries used in the projects
#can you rewrite these as require?
library(plyr)
library(lattice)
library(Hmisc)
#import files that are siliques
gen0sil<-read.csv('2013_11_11_Exp1_Gen0_siliques.csv', header=T)
names(gen0sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
gen1sil<-read.csv("2014_04_15_Exp1_Gen1_siliques.csv", header=T)
names(gen1sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
gen2sil<-read.csv("2014_06_12_Exp1_Gen2_siliques.csv", header=T)
#note, I deleted the column called 'rep' from the csv. I'm not sure what it means, and the other data files don't have it
names(gen2sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
gen3sil<-read.csv("2014_11_27_Exp1_Gen3_siliques.csv", header=T)
names(gen3sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
gen4sil<-read.csv("2015_03_13_Exp1_Gen4_siliques.csv", header=T)
names(gen4sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
gen5sil<-read.csv("2015_08_25_Exp1_Gen5_siliques.csv", header=T)
names(gen5sil)<-c("gen", "ID", "trt","gap_size","pot","siliques")
allsil<-rbind(gen0sil, gen1sil, gen2sil, gen3sil, gen4sil, gen5sil)
#since we used the counts from trtA plants for trtB, we can only analyze variation from trtA plants
allsilA<-subset(allsil, trt=="A")
#we could also merge these data (very carefully!) with the runway data from the next generation
#that would allow us to say something about how silique number affects dispersal/spread in the two treatments
#HOWEVER, this will only include a subset of the data, because the silique numbers we have are not necessarily for the leading edge of the solitary plants (should be of the clipped, though!)
#I collected height data in October 2015
#I did not enter the data, and there are some funny things that I should have noticed earlier!
#for example, every record should have a chamber number, but those are missing
#do the raw data sheets exist anywhere?
gen5heights<-read.csv("2015_10_26_Exp1_Gen5_Height.csv", header=T)
########-----------
###import the side experiment data here (and anything else you think might be helpful)
disp_spray<-read.csv("2013_08_08_Exp1_Spray.csv", header=T)
#names(disp_spray)<-c("ecot","trt","mom_pot","pot","d","sdlgs","mom_sil")
#import Frankenstein data here (all data that are relevant to effects of density on height & siliques)
frank_traits<-read.csv("2016_06_24_LerTraitsxDensity_ALLexperiments.csv", header=T)
frank_traits_ring<-subset(frank_traits, exp=="ring")
testply<-ddply(frank_traits, .(ID), summarise,
meansil=mean(siliques, na.rm=T),
meanht=mean(height, na.rm=T))
meantraits<-join(testply, frank_traits, by="ID",type="left",match="first")
meantraits$totsil<-meantraits$meansil*meantraits$final_density
########-----------
###ALL SEEDLING DATA
seedlings_gen1<-read.csv("2014_02_18_Exp1_Gen1_seedposition.csv", header=T)
names(seedlings_gen1)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen2<-read.csv("2014_06_12_Exp1_Gen2_seedposition.csv", header=T)
names(seedlings_gen2)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen3<-read.csv("2014_11_11_Exp1_Gen3_seedposition.csv", header=T) #Gen3
names(seedlings_gen3)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen4<-read.csv("2015_03_11_Exp1_Gen4_seedposition.csv", header=T) #Gen4
names(seedlings_gen4)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen5<-read.csv("2015_08_11_Exp1_Gen5_seedposition.csv", header=T) #Gen5
names(seedlings_gen5)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen6<-read.csv("2015_11_10_Exp1_Gen6_seedposition.csv", header=T) #Gen6
names(seedlings_gen6)<-c("ecot","ID","trt","Rep","gap_size","gen","pot","d","sdlgs")
seedlings_gen6<- seedlings_gen6[-140,] #drop ID 27 record of jumping 8 pots
seedlings_gen6<- seedlings_gen6[-c(163:169),] #drop ID 32 record of jumping 6 pots and being super dense
seedlingsALL<-rbind(seedlings_gen1,seedlings_gen2, seedlings_gen3, seedlings_gen4, seedlings_gen5, seedlings_gen6)
#drop runways with problems
# 35 & 60: a pot went missing (so it looks like the runway shrank)
# 45: different data on spreadsheet as on data sheets and can't figure it out. also, HUGE leap
seedlingsALL<-subset(seedlingsALL,!(ID %in% c(35, 60)))
############CALCULATE MAXD FOR EACH GENERATION AND MAKE FILE OF SPEEDS
#calculate max distance and number of plants/pot in all generations
maxd_ply<-ddply(seedlingsALL, .(ID,gen), summarise,
lastpot=max(pot, na.rm=T),
farseed=max(d,na.rm=T),
lastsdlgs=sum(sdlgs[pot==max(pot)])) #this selects number of sdlgs in the last pot
maxd<-join(maxd_ply, seedlingsALL, by=c("ID","gen"),type="left",match="first")
maxd<-maxd[,1:9]
#now, move dense pots to the furthest right they could be
maxd$farseed<-ifelse(maxd$lastpot==0,7,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==2 & maxd$lastsdlgs>=10,22,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==3 & maxd$lastsdlgs>=10,29,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==4 & maxd$lastsdlgs>=10,36,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==5 & maxd$lastsdlgs>=10,44,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==6 & maxd$lastsdlgs>=10,51,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==7 & maxd$lastsdlgs>=10,58,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==8 & maxd$lastsdlgs>=10,66,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==9 & maxd$lastsdlgs>=10,73,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==10 & maxd$lastsdlgs>=10,80,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==11 & maxd$lastsdlgs>=10,88,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==12 & maxd$lastsdlgs>=10,95,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==13 & maxd$lastsdlgs>=10,102,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==16 & maxd$lastsdlgs>=10,124,maxd$farseed)
maxd$farseed <-ifelse(maxd$lastpot==17 & maxd$lastsdlgs>=10,132,maxd$farseed)
#rescale so pot 0 is at 0
maxd$farseed<-maxd$farseed - 7
#create some variables so we can look at speed from different time points.
#Goal: I want to be able to say something about distance traveled between gen. 1 and gen. 6 and between gen. 1 and gen. 5
speeds<-ddply(maxd, .(ID), summarise,
dist1to6 = farseed[6] - farseed[1],
dist1to5 = farseed[5] - farseed[1],
farseed6 = farseed[6],
farseed5 = farseed[5],
dist1to3 = farseed[3] - farseed[1])
speeds<-join(speeds, maxd, by=c("ID"),type="left",match="first")
speeds<-speeds[,c(1:6, 12:14)]
##ADD gap size as a continuous variable to the dataframe with Bruce's function.
# Function to add gap size as a continuous variable in units of mean dispersal distance
# (1 pot is 2x mean dispersal) to a data frame.
#The estimate of 1 pot = 2 x MDD is stil an estimate, but better than what we had before.
#updata again, when you calculate it more precisely out the mean dispersal distance of Ler
# Argument must be a data frame with a column named "gap_size"
# Relies on the fact that the levels of the variable (1, 2, 3, 4) correspond to gap sizes
# (0, 1, 2, 3) pots
add_gapnumber <- function(x) {
x$gapnumber <- (as.numeric(x$gap_size) - 1) * 2
return(x)
}
maxd<-add_gapnumber(maxd)
speeds<-add_gapnumber(speeds)
####*********************************************************************************
########HEIGHT DATA FROM GEN. 5
gen5heights<-read.csv("2015_10_26_Exp1_Gen5_Height.csv", header=T)
#a few more transformations and joining
maxd_gen5<-subset(maxd, gen==5)
maxd_gen5<-join(maxd_gen5, gen5heights, by=c("ID"),type="left",match="first")
maxd_gen6<-subset(maxd, gen==6)
#this might be dangerous (but I don't think I will be sorting the data)
maxd_gen5$farseedT1<-maxd_gen6$farseed
maxd_gen5$laspotT1<-maxd_gen6$lastpot
maxd_gen5$distTravel<-maxd_gen5$farseedT1-maxd_gen5$farseed
####*********************************************************************************
###CAN YOU COMBINE SILIQUES AT FRONT IN PREV. GEN WITH DISTANCE IN NEXT GEN?
##NOTE that you can only do this for treatments A and B
#here are the siliques for each ID in the furthest pot where they were counted
sil_ply<-ddply(allsil, .(ID,gen), summarise,
lastpot_sil=max(pot, na.rm=T),
lastsil= siliques[which.max(pot)])
sil_tomerge<-join(sil_ply, maxd, by=c("ID","gen"),type="left",match="first")
sil_tomerge<-sil_tomerge[,c(1:5)]
#now we need a test if these are actually the furthest pot.
#they don't match 16 times in total (these silique numbers now get NA)
sil_tomerge$lastsil<-ifelse(sil_tomerge$lastpot_sil==0, sil_tomerge$lastsil,
ifelse(sil_tomerge$lastpot_sil==sil_tomerge$lastpot,sil_tomerge$lastsil,NA))
sil_tomerge<-sil_tomerge[,c(1,2,4)]
sil_tomerge$gen<-sil_tomerge$gen+1
#maxd<-join(maxd, sil_tomerge, by=c("ID","gen"),type="left",match="first")
#but this is not quite what I want, because what I'd really like to know is how many cm did the invasion move between generations to run this analysis. It's not going to be that helpful to know what the very furthest seed is.
maxdT0<-maxd
names(maxdT0)<-sub("farseed","farseedT0",names(maxdT0))
maxdT0$gen<-maxdT0$gen+1
maxdT0<-maxdT0[,c(1,2,4)]
distbygen<-merge(maxd,maxdT0, by=c("ID","gen"),all.x=T)
distbygen$farseedT0<-ifelse(is.na(distbygen$farseedT0),0,distbygen$farseedT0)
distbygen$cm_travel=distbygen$farseed-distbygen$farseedT0
distbygen<-distbygen[,c(1,2,5,7,9,10,12)]
#NOW, MERGE WITH SILIQUES! (now from the current generation)
distbygen_sil<-join(sil_tomerge, distbygen, by=c("ID","gen"),type="left",match="first")
In the short run I can use the ALL SEEDLING DATA section to replace the data import part of popLer.R.
I’ve revised popLer.R and popRIL.R in the data directory to use Jenn’s cleaned up data. Note, however, that this loses generation 7 from both experiments. I have a query in to Jenn about this.
Jenn has this to say about generation 7 of the evolution experiment (email of 19 May):
Gen7 was the mystery one… it had tons of contaminants in the genetic data, seedlings went weird distances (strangely far and/or not at all), and that’s when Sara was really having a ton of trouble. We didn’t include Gen7 in the Science paper because of all these potential challenges with it (although did analyze it to make sure the results didn’t all just go away!). So I think also better to leave it out of your analyses.
Need to check if the data have changed!
(Update of original analysis of 21 April 2017)
ggplot(Ler_seed_gen1, aes(x = eff_sd_no, fill = as.factor(Density))) +
geom_histogram(binwidth = 50)
kable(group_by(Ler_seed_gen1, Density) %>%
summarise(Mean = mean(eff_sd_no), Variance = var(eff_sd_no)),
caption = paste("Mean and variance across pots of effective seed number",
"in treatments B and C of Ler generation 1")
)
| Density | Mean | Variance |
|---|---|---|
| 1 | 133.9 | 3443.433 |
| 50 | 347.6 | 21362.933 |
This is identical to the original analysis.
This is the part of the 24 April analysis that looked at treatment B:
Ler3pBC <- filter(popLer, Gap == "3p", Treatment != "A")
Ler3pBC <- group_by(Ler3pBC, Treatment, Pot, Rep) %>%
mutate(Adults = lag(Seedlings))
Ler3pBC <- within(Ler3pBC,
{
Adults[Treatment == "B" & !is.na(Adults)] <- 1
Adults[Pot == 0 & Generation == 1] <- 1
Adults[Treatment == "C" & Pot == 0 & Generation == 1] <- 50
})
filter(Ler3pBC, Treatment == "B") %>%
ggplot(aes(x = Gen, y = Seedlings)) + xlab("Generation") + geom_boxplot(notch = TRUE)
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
By eye, this looks unchanged. Let’s try a log scale:
filter(Ler3pBC, Treatment == "B") %>%
ggplot(aes(x = Gen, y = Seedlings)) + xlab("Generation") + geom_boxplot(notch = TRUE) +
scale_y_log10()
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
That looks to mostly skew things in the other direction. Perhaps a square root transform?
filter(Ler3pBC, Treatment == "B") %>%
ggplot(aes(x = Gen, y = Seedlings)) + xlab("Generation") + geom_boxplot(notch = TRUE) +
scale_y_sqrt()
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
notch went outside hinges. Try setting notch=FALSE.
That’s nice. Now let’s do a linear model:
summary(lm(sqrt(Seedlings) ~ Gen, data = filter(Ler3pBC, Treatment == "B")))
Call:
lm(formula = sqrt(Seedlings) ~ Gen, data = filter(Ler3pBC, Treatment ==
"B"))
Residuals:
Min 1Q Median 3Q Max
-6.1230 -2.2023 -0.0066 1.8917 11.5703
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.4447 0.9746 8.665 8.8e-13 ***
Gen2 -0.9075 1.3466 -0.674 0.50251
Gen3 -1.6444 1.3466 -1.221 0.22601
Gen4 -2.8167 1.2582 -2.239 0.02827 *
Gen5 -1.4678 1.2582 -1.167 0.24724
Gen6 -3.9232 1.2424 -3.158 0.00232 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.082 on 72 degrees of freedom
Multiple R-squared: 0.1576, Adjusted R-squared: 0.0991
F-statistic: 2.694 on 5 and 72 DF, p-value: 0.02744
It actually looks like it might be linear!
summary(lm(sqrt(Seedlings) ~ Generation, data = filter(Ler3pBC, Treatment == "B")))
Call:
lm(formula = sqrt(Seedlings) ~ Generation, data = filter(Ler3pBC,
Treatment == "B"))
Residuals:
Min 1Q Median 3Q Max
-6.1911 -1.9276 0.1918 1.5063 12.8627
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.8859 0.8608 10.323 4.08e-16 ***
Generation -0.6403 0.2074 -3.088 0.00282 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.081 on 76 degrees of freedom
Multiple R-squared: 0.1115, Adjusted R-squared: 0.09977
F-statistic: 9.533 on 1 and 76 DF, p-value: 0.002816
This definitely seems like a more parsimonious model. However, it’s not clear how we should interpret it! Certainly it doesn’t accord with fluctuating effects of season or insect outbreaks. It could be that the team got better at inducing dispersal… one thing to look at would be to see whether there is a similar trend in the other landscapes, or a trend in the number of seeds jumping the gap.
OK, back to the reboot:
(also based on April 24)
qplot(data=Ler3pBC, x = Adults, y = Seedlings/Adults,
colour = Gen, log = "xy") +
geom_smooth(method = "lm")
Warning: Removed 14 rows containing non-finite values (stat_smooth).
Warning: Removed 14 rows containing missing values (geom_point).
Generation 6 looks really different.
It’s a bit less pronounced if we leave out treatment B, but it’s still there:
qplot(data=filter(Ler3pBC, Treatment == "C"), x = Adults, y = Seedlings/Adults,
colour = Gen, log = "xy") +
geom_smooth(method = "lm")
Warning: Removed 8 rows containing non-finite values (stat_smooth).
Warning: Removed 8 rows containing missing values (geom_point).
Is it just Gen 6, or is there still a linear trend in the intercept?
m1 <- lm(log(Seedlings/Adults) ~ Adults + Gen, data = Ler3pBC)
m2 <- lm(log(Seedlings/Adults) ~ Adults + Generation, data = Ler3pBC)
AIC(m1)
[1] 474.2151
AIC(m2)
[1] 481.8171
summary(m1)
Call:
lm(formula = log(Seedlings/Adults) ~ Adults + Gen, data = Ler3pBC)
Residuals:
Min 1Q Median 3Q Max
-4.2616 -0.8728 0.0230 0.8158 3.6531
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9266472 0.2859379 10.235 <2e-16 ***
Adults -0.0052269 0.0003613 -14.468 <2e-16 ***
Gen2 0.0893939 0.4048630 0.221 0.8256
Gen3 0.5141724 0.4000966 1.285 0.2010
Gen4 0.7346576 0.4048698 1.815 0.0718 .
Gen5 0.4482726 0.3697123 1.212 0.2274
Gen6 -0.5019487 0.3731412 -1.345 0.1808
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.246 on 135 degrees of freedom
(14 observations deleted due to missingness)
Multiple R-squared: 0.6429, Adjusted R-squared: 0.627
F-statistic: 40.51 on 6 and 135 DF, p-value: < 2.2e-16
anova(m1)
Analysis of Variance Table
Response: log(Seedlings/Adults)
Df Sum Sq Mean Sq F value Pr(>F)
Adults 1 350.80 350.80 226.0342 < 2.2e-16 ***
Gen 5 26.39 5.28 3.4013 0.006314 **
Residuals 135 209.52 1.55
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m2)
Call:
lm(formula = log(Seedlings/Adults) ~ Adults + Generation, data = Ler3pBC)
Residuals:
Min 1Q Median 3Q Max
-4.8110 -0.9846 0.1907 0.9253 2.9576
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.3590649 0.2666127 12.599 <2e-16 ***
Adults -0.0050648 0.0003674 -13.785 <2e-16 ***
Generation -0.0722116 0.0652395 -1.107 0.27
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.297 on 139 degrees of freedom
(14 observations deleted due to missingness)
Multiple R-squared: 0.6014, Adjusted R-squared: 0.5957
F-statistic: 104.9 on 2 and 139 DF, p-value: < 2.2e-16
Here, if anything, it looks quadratic:
m3 <- lm(log(Seedlings/Adults) ~ Adults + poly(Generation,2), data = Ler3pBC)
AIC(m3)
[1] 471.2079
anova(m3)
Analysis of Variance Table
Response: log(Seedlings/Adults)
Df Sum Sq Mean Sq F value Pr(>F)
Adults 1 350.80 350.80 226.2383 < 2.2e-16 ***
poly(Generation, 2) 2 21.93 10.97 7.0719 0.001192 **
Residuals 138 213.98 1.55
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m3)
Call:
lm(formula = log(Seedlings/Adults) ~ Adults + poly(Generation,
2), data = Ler3pBC)
Residuals:
Min 1Q Median 3Q Max
-4.3838 -1.0034 0.0389 0.7699 3.5375
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.1352719 0.1256288 24.957 < 2e-16 ***
Adults -0.0052341 0.0003559 -14.708 < 2e-16 ***
poly(Generation, 2)1 -1.4438975 1.3083800 -1.104 0.271698
poly(Generation, 2)2 -4.7130036 1.3165803 -3.580 0.000475 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.245 on 138 degrees of freedom
(14 observations deleted due to missingness)
Multiple R-squared: 0.6353, Adjusted R-squared: 0.6274
F-statistic: 80.13 on 3 and 138 DF, p-value: < 2.2e-16
Maybe. We can also try dropping Gen 6:
m4 <- lm(log(Seedlings/Adults) ~ Adults + Gen, data = filter(Ler3pBC, Generation < 6))
m5 <- lm(log(Seedlings/Adults) ~ Adults + poly(Generation, 2),
data = filter(Ler3pBC, Generation < 6))
AIC(m4)
[1] 342.6575
AIC(m5)
[1] 340.3917
summary(m4)
Call:
lm(formula = log(Seedlings/Adults) ~ Adults + Gen, data = filter(Ler3pBC,
Generation < 6))
Residuals:
Min 1Q Median 3Q Max
-2.12284 -0.85786 0.03988 0.68163 2.45417
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9377619 0.2510228 11.703 <2e-16 ***
Adults -0.0056859 0.0003939 -14.435 <2e-16 ***
Gen2 0.1190896 0.3556574 0.335 0.7384
Gen3 0.5993128 0.3538179 1.694 0.0933 .
Gen4 0.8720573 0.3621620 2.408 0.0178 *
Gen5 0.5209411 0.3265875 1.595 0.1137
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.093 on 105 degrees of freedom
(12 observations deleted due to missingness)
Multiple R-squared: 0.6716, Adjusted R-squared: 0.656
F-statistic: 42.95 on 5 and 105 DF, p-value: < 2.2e-16
anova(m4)
Analysis of Variance Table
Response: log(Seedlings/Adults)
Df Sum Sq Mean Sq F value Pr(>F)
Adults 1 247.509 247.509 207.0324 <2e-16 ***
Gen 4 9.201 2.300 1.9241 0.1118
Residuals 105 125.528 1.196
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(m5)
Call:
lm(formula = log(Seedlings/Adults) ~ Adults + poly(Generation,
2), data = filter(Ler3pBC, Generation < 6))
Residuals:
Min 1Q Median 3Q Max
-2.05718 -0.94013 0.09753 0.73707 2.36654
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.3867835 0.1246611 27.168 <2e-16 ***
Adults -0.0056051 0.0003878 -14.453 <2e-16 ***
poly(Generation, 2)1 2.4370676 1.1543458 2.111 0.0371 *
poly(Generation, 2)2 -1.5904080 1.1492391 -1.384 0.1693
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.092 on 107 degrees of freedom
(12 observations deleted due to missingness)
Multiple R-squared: 0.6664, Adjusted R-squared: 0.6571
F-statistic: 71.26 on 3 and 107 DF, p-value: < 2.2e-16
anova(m5)
Analysis of Variance Table
Response: log(Seedlings/Adults)
Df Sum Sq Mean Sq F value Pr(>F)
Adults 1 247.509 247.509 207.7053 < 2e-16 ***
poly(Generation, 2) 2 7.224 3.612 3.0313 0.05241 .
Residuals 107 127.505 1.192
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Yeah, that pretty much eliminates it all.
(Rerun of analyses from 5 and 8 May)
Look at trends by rep:
ggplot(LerC_spread, aes(x = Generation, y = speed, color = Rep)) +
geom_point(position = "jitter") +
geom_smooth(method = "lm", se = FALSE) +
facet_wrap(~Gap)
ggplot(LerC_spread, aes(x = Generation, y = speed)) +
geom_point(position = "jitter") +
geom_smooth() +
facet_wrap(~Gap)
ggplot(LerC_spread, aes(x = Generation, y = speed)) +
geom_point(position = "jitter") +
geom_smooth(method = "gam", method.args = list(k = 4)) +
facet_wrap(~Gap)
summary(lm(speed ~ Generation, data = filter(LerC_spread, Gap == "0p")))
Call:
lm(formula = speed ~ Generation, data = filter(LerC_spread, Gap ==
"0p"))
Residuals:
Min 1Q Median 3Q Max
-2.4705 -1.3090 0.0295 1.1224 3.5295
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.2933 0.4609 7.145 1.67e-09 ***
Generation -0.2743 0.1183 -2.318 0.024 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.566 on 58 degrees of freedom
Multiple R-squared: 0.08476, Adjusted R-squared: 0.06898
F-statistic: 5.371 on 1 and 58 DF, p-value: 0.02402
library(mgcv)
summary(gam(speed ~ s(Generation, k = 4), data = filter(LerC_spread, Gap == "0p")))
Family: gaussian
Link function: identity
Formula:
speed ~ s(Generation, k = 4)
Parametric coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3333 0.1828 12.77 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Approximate significance of smooth terms:
edf Ref.df F p-value
s(Generation) 2.223 2.597 7.506 0.000784 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R-sq.(adj) = 0.239 Deviance explained = 26.7%
GCV = 2.1184 Scale est. = 2.0046 n = 60
plot(gam(speed ~ s(Generation, k = 4), data = filter(LerC_spread, Gap == "0p")))
anova(gam(speed ~ s(Generation, k = 4), data = filter(LerC_spread, Gap == "0p")),
gam(speed ~ Generation, data = filter(LerC_spread, Gap == "0p")),
test = "Chisq")
Analysis of Deviance Table
Model 1: speed ~ s(Generation, k = 4)
Model 2: speed ~ Generation
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 56.403 113.82
2 58.000 142.17 -1.5975 -28.351 0.0004792 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This replicates the prior analysis exactly.
Now look at the autocorrelation and interaction analysis from 8 May:
m13 <- lm(speed ~ poly(Generation, 2) * Rep + speed_m1,
data = filter(LerC_spread, Gap == "0p"))
summary(m13)
Call:
lm(formula = speed ~ poly(Generation, 2) * Rep + speed_m1, data = filter(LerC_spread,
Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.14668 -0.47051 0.02562 0.52439 2.46495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.054e+00 9.512e-01 3.211 0.0046 **
poly(Generation, 2)1 1.990e+00 7.837e+00 0.254 0.8023
poly(Generation, 2)2 -2.395e+00 7.431e+00 -0.322 0.7507
Rep2 4.945e-01 1.154e+00 0.429 0.6730
Rep3 1.295e-01 1.159e+00 0.112 0.9122
Rep4 2.280e+00 1.154e+00 1.976 0.0629 .
Rep5 6.689e-01 1.166e+00 0.574 0.5730
Rep6 -6.265e-01 1.150e+00 -0.545 0.5922
Rep7 1.540e-02 1.160e+00 0.013 0.9895
Rep8 1.467e+00 1.155e+00 1.270 0.2194
Rep9 -9.598e-01 1.150e+00 -0.835 0.4142
Rep10 8.141e-01 1.148e+00 0.709 0.4869
speed_m1 -4.215e-01 1.982e-01 -2.126 0.0468 *
poly(Generation, 2)1:Rep2 -1.014e+01 1.115e+01 -0.910 0.3745
poly(Generation, 2)2:Rep2 -5.019e+00 1.087e+01 -0.462 0.6495
poly(Generation, 2)1:Rep3 -1.910e+01 1.109e+01 -1.722 0.1012
poly(Generation, 2)2:Rep3 7.551e+00 1.057e+01 0.714 0.4838
poly(Generation, 2)1:Rep4 -1.356e+01 1.122e+01 -1.208 0.2417
poly(Generation, 2)2:Rep4 3.330e+00 1.155e+01 0.288 0.7762
poly(Generation, 2)1:Rep5 -3.324e+00 1.110e+01 -0.299 0.7678
poly(Generation, 2)2:Rep5 -8.064e+00 1.046e+01 -0.771 0.4502
poly(Generation, 2)1:Rep6 -3.761e+00 1.109e+01 -0.339 0.7382
poly(Generation, 2)2:Rep6 5.375e-14 1.044e+01 0.000 1.0000
poly(Generation, 2)1:Rep7 -1.072e+01 1.111e+01 -0.965 0.3467
poly(Generation, 2)2:Rep7 -3.057e+00 1.080e+01 -0.283 0.7801
poly(Generation, 2)1:Rep8 5.176e+00 1.121e+01 0.462 0.6496
poly(Generation, 2)2:Rep8 -7.049e+00 1.053e+01 -0.669 0.5114
poly(Generation, 2)1:Rep9 2.805e+00 1.108e+01 0.253 0.8028
poly(Generation, 2)2:Rep9 -1.315e+01 1.057e+01 -1.244 0.2287
poly(Generation, 2)1:Rep10 -6.363e+00 1.115e+01 -0.571 0.5750
poly(Generation, 2)2:Rep10 -1.054e+01 1.087e+01 -0.970 0.3444
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.43 on 19 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.7304, Adjusted R-squared: 0.3048
F-statistic: 1.716 on 30 and 19 DF, p-value: 0.1102
car::Anova(m13)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 46.959 2 11.4785 0.0005389 ***
Rep 32.487 9 1.7646 0.1422649
speed_m1 9.249 1 4.5217 0.0467938 *
poly(Generation, 2):Rep 29.613 18 0.8043 0.6761230
Residuals 38.865 19
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Drop the interaction:
m14 <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(LerC_spread, Gap == "0p"))
summary(m14)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(LerC_spread,
Gap == "0p"))
Residuals:
Min 1Q Median 3Q Max
-2.89320 -0.68615 -0.06304 0.68486 2.49198
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.9649 0.7018 4.225 0.00015 ***
poly(Generation, 2)1 -4.2553 2.4272 -1.753 0.08786 .
poly(Generation, 2)2 -5.5072 2.3684 -2.325 0.02565 *
Rep2 0.2662 0.8609 0.309 0.75893
Rep3 -0.7985 0.8653 -0.923 0.36206
Rep4 1.5985 0.8653 1.847 0.07269 .
Rep5 0.7309 0.8738 0.836 0.40831
Rep6 -0.7323 0.8626 -0.849 0.40133
Rep7 -0.2662 0.8609 -0.309 0.75893
Rep8 1.7970 0.8797 2.043 0.04824 *
Rep9 -0.4000 0.8604 -0.465 0.64473
Rep10 0.8647 0.8690 0.995 0.32619
speed_m1 -0.3309 0.1526 -2.169 0.03660 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.36 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.5251, Adjusted R-squared: 0.371
F-statistic: 3.409 on 12 and 37 DF, p-value: 0.002013
car::Anova(m14)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 46.959 2 12.6865 6.372e-05 ***
Rep 32.487 9 1.9504 0.07456 .
speed_m1 8.705 1 4.7036 0.03660 *
Residuals 68.478 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So, in conclusion: weak evidence for among-Rep differences in means, but strong evidence for negative autocorrelation. Note that this autocorrelation should act to slow the rate at which the variance in cumulative spread increases with time.
Let’s check out the other landscapes:
m24 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "1p"))
m34 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "2p"))
m44 <- lm(speed ~ Gen + Rep + speed_m1,
data = filter(LerC_spread, Gap == "3p"))
summary(m24)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "1p"))
Residuals:
Min 1Q Median 3Q Max
-1.9393 -1.0203 -0.2807 0.7435 3.0813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.939e+00 8.046e-01 2.410 0.0213 *
Gen3 -6.966e-01 6.728e-01 -1.035 0.3076
Gen4 -6.345e-01 6.671e-01 -0.951 0.3481
Gen5 -2.000e-01 6.664e-01 -0.300 0.7658
Gen6 -1.131e+00 6.692e-01 -1.690 0.0999 .
Rep2 1.379e-01 9.504e-01 0.145 0.8855
Rep3 -1.100e-15 9.424e-01 0.000 1.0000
Rep4 -1.008e-15 9.424e-01 0.000 1.0000
Rep5 9.379e-01 9.504e-01 0.987 0.3305
Rep6 4.000e-01 9.424e-01 0.424 0.6738
Rep7 -3.310e-01 9.444e-01 -0.351 0.7280
Rep8 4.690e-01 9.444e-01 0.497 0.6226
Rep9 4.000e-01 9.424e-01 0.424 0.6738
Rep10 1.007e+00 9.604e-01 1.048 0.3017
speed_m1 -1.724e-01 1.543e-01 -1.117 0.2715
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.49 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.1932, Adjusted R-squared: -0.1295
F-statistic: 0.5988 on 14 and 35 DF, p-value: 0.8474
summary(m34)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "2p"))
Residuals:
Min 1Q Median 3Q Max
-2.9139 -0.8927 -0.3292 0.9515 3.3861
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.206e+00 8.573e-01 2.573 0.0145 *
Gen3 -3.000e-01 7.061e-01 -0.425 0.6736
Gen4 -5.386e-02 7.076e-01 -0.076 0.9398
Gen5 3.010e-16 7.061e-01 0.000 1.0000
Gen6 -6.000e-01 7.061e-01 -0.850 0.4013
Rep2 1.077e-01 1.003e+00 0.107 0.9151
Rep3 -1.200e+00 9.986e-01 -1.202 0.2376
Rep4 -1.200e+00 9.986e-01 -1.202 0.2376
Rep5 -1.308e+00 1.003e+00 -1.304 0.2007
Rep6 -1.308e+00 1.003e+00 -1.304 0.2007
Rep7 7.077e-01 1.003e+00 0.706 0.4850
Rep8 -1.200e+00 9.986e-01 -1.202 0.2376
Rep9 -2.015e+00 1.015e+00 -1.985 0.0550 .
Rep10 -4.923e-01 1.003e+00 -0.491 0.6266
speed_m1 -1.795e-01 1.524e-01 -1.178 0.2469
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.579 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2666, Adjusted R-squared: -0.02679
F-statistic: 0.9087 on 14 and 35 DF, p-value: 0.558
summary(m44)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(LerC_spread,
Gap == "3p"))
Residuals:
Min 1Q Median 3Q Max
-1.9660 -0.6326 -0.5617 0.2950 3.4950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.660e-01 8.802e-01 -0.416 0.6804
Gen3 7.092e-02 7.694e-01 0.092 0.9271
Gen4 1.404e+00 7.694e-01 1.825 0.0776 .
Gen5 2.837e-01 8.417e-01 0.337 0.7384
Gen6 7.092e-02 7.694e-01 0.092 0.9271
Rep2 9.277e-01 1.038e+00 0.894 0.3782
Rep3 9.277e-01 1.038e+00 0.894 0.3782
Rep4 1.855e+00 1.073e+00 1.728 0.0939 .
Rep5 9.277e-01 1.038e+00 0.894 0.3782
Rep6 9.277e-01 1.038e+00 0.894 0.3782
Rep7 8.000e-01 1.025e+00 0.780 0.4412
Rep9 9.277e-01 1.038e+00 0.894 0.3782
Rep10 -8.426e-16 1.025e+00 0.000 1.0000
speed_m1 -1.596e-01 1.983e-01 -0.805 0.4272
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.621 on 31 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.2256, Adjusted R-squared: -0.0991
F-statistic: 0.6948 on 13 and 31 DF, p-value: 0.7534
car::Anova(m24)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 7.794 4 0.8776 0.4873
Rep 7.984 9 0.3996 0.9268
speed_m1 2.772 1 1.2485 0.2715
Residuals 77.708 35
car::Anova(m34)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 2.708 4 0.2715 0.8943
Rep 28.733 9 1.2805 0.2819
speed_m1 3.458 1 1.3869 0.2469
Residuals 87.262 35
car::Anova(m44)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 12.533 4 1.1918 0.3339
Rep 10.906 8 0.5186 0.8332
speed_m1 1.702 1 0.6475 0.4272
Residuals 81.498 31
Nothing to see here, folks!
The pattern of autocorrelation should, I think, fall into the category of something that we want the model to reproduce.
And fortunately, nothing has changed from the prior analysis.
And finally double check the cumulative spread plots:
Let’s calculate the means and variances of cumulative spread:
cum_spread_stats <- group_by(LerC_spread, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(Furthest),
var_spread = var(Furthest),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_smooth(method = "lm")
ggplot(aes(x = Generation, y = var_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_line()
ggplot(aes(x = Generation, y = CV_spread, color = Gap), data = cum_spread_stats) +
geom_point() + geom_line()
Warning: Removed 1 rows containing missing values (geom_point).
Warning: Removed 1 rows containing missing values (geom_path).
This looks just like the previous results—repeating my previous summary,
The linear approximation to mean spread is pretty good, although we see a dip in generation 6, as expected from the reduced seed production, and even without gen 6, the continuous runways seem to be decelerating. The ranking between landscapes makes sense. The variances are rather more over the map, although we don’t have confidence intervals on them. But exept for 1-pot gaps, we can squint and imagine that the variances are linear in time. Note, however, that with only 6 generations we’re not going to easily detect nonlinearities. I don’t know that the CVs tell us much.
It really does seem relevant to do the bootstrapping, to see whether the (rather nonsystematic) patterns in the cumulative variance vs. time are real.
(From 10 May)
Let’s calculate the means and variances of cumulative spread:
cum_spread_stats <- group_by(RIL_spread, Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(Furthest),
var_spread = var(Furthest),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = cum_spread_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 1 rows containing missing values (geom_point).
The patterns in the mean are probably not overly distinguishable from linear, although I’d want to see CIs.
Let’s look at per-generation spread.
speed_stats <- group_by(RIL_spread, Treatment, Gap, Gen, Generation) %>%
summarise(mean_spread = mean(speed),
var_spread = var(speed),
CV_spread = sqrt(var_spread)/mean_spread
)
And now plot the results:
ggplot(aes(x = Generation, y = mean_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = var_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
ggplot(aes(x = Generation, y = CV_spread, color = Treatment), data = speed_stats) +
geom_point() + geom_line() + facet_wrap(~ Gap)
Warning: Removed 5 rows containing missing values (geom_point).
These plots look pretty similar (though not identical) to the second pass in the original analysis (after I dropped the problematic rep).
In 0p and 1p, the treatments look very similar. But I bet there will be differences in the autocorrelation structure. Let’s fit some models.
m1_0NO <- lm(speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread, Gap == "0p", Treatment == "NO"))
summary(m1_0NO)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-1.67770 -0.77997 -0.03829 0.87593 1.87512
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.47770 0.62701 3.952 0.000359 ***
Gen3 -0.17399 0.50495 -0.345 0.732474
Gen4 -0.44960 0.49971 -0.900 0.374424
Gen5 0.10000 0.49871 0.201 0.842237
Gen6 1.15121 0.50767 2.268 0.029631 *
Rep2 -0.14960 0.70599 -0.212 0.833417
Rep3 0.95282 0.73925 1.289 0.205887
Rep4 0.10081 0.70812 0.142 0.887615
Rep5 1.00161 0.71655 1.398 0.170964
Rep6 -0.04879 0.71164 -0.069 0.945728
Rep7 0.55121 0.71164 0.775 0.443804
Rep8 1.05201 0.72281 1.455 0.154458
Rep9 1.50242 0.73039 2.057 0.047199 *
Rep10 1.55282 0.73925 2.101 0.042955 *
speed_m1 -0.25201 0.15821 -1.593 0.120183
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.115 on 35 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.4056, Adjusted R-squared: 0.1678
F-statistic: 1.706 on 14 and 35 DF, p-value: 0.09938
car::Anova(m1_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 14.656 4 2.9464 0.03367 *
Rep 16.263 9 1.4531 0.20406
speed_m1 3.155 1 2.5372 0.12018
Residuals 43.525 35
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m1_0YES <- lm(speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread, Gap == "0p", Treatment == "YES"))
summary(m1_0YES)
Call:
lm(formula = speed ~ Gen + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-2.9829 -1.0494 -0.1159 1.3510 2.8446
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.883e+00 1.038e+00 3.743 0.000742 ***
Gen3 1.160e+00 8.272e-01 1.403 0.170584
Gen4 6.541e-01 8.501e-01 0.769 0.447472
Gen5 1.988e+00 8.278e-01 2.401 0.022530 *
Gen6 3.329e-01 8.875e-01 0.375 0.710116
Rep2 -8.888e-01 1.110e+00 -0.801 0.429366
Rep4 -6.665e-01 1.114e+00 -0.598 0.553872
Rep5 -8.885e-02 1.110e+00 -0.080 0.936718
Rep6 -1.777e-01 1.111e+00 -0.160 0.874011
Rep7 -5.554e-01 1.117e+00 -0.497 0.622542
Rep8 -1.089e+00 1.110e+00 -0.981 0.334217
Rep9 1.566e-15 1.110e+00 0.000 1.000000
Rep10 -1.554e-01 1.117e+00 -0.139 0.890260
speed_m1 -4.442e-01 1.610e-01 -2.759 0.009630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.754 on 31 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.4034, Adjusted R-squared: 0.1532
F-statistic: 1.612 on 13 and 31 DF, p-value: 0.135
car::Anova(m1_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Gen 21.292 4 1.7295 0.16864
Rep 6.668 8 0.2708 0.97078
speed_m1 23.436 1 7.6148 0.00963 **
Residuals 95.408 31
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This is getting different. Let’s try some models with continuous Generation:
m1_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "NO"))
summary(m1_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-1.73288 -0.79389 -0.01207 0.84396 1.91342
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.71691 0.55729 4.875 2.07e-05 ***
poly(Generation, 2)1 0.30080 1.89328 0.159 0.8746
poly(Generation, 2)2 4.51283 1.78375 2.530 0.0158 *
Rep2 -0.15343 0.69108 -0.222 0.8255
Rep3 0.92600 0.72207 1.282 0.2077
Rep4 0.09314 0.69306 0.134 0.8938
Rep5 0.98628 0.70091 1.407 0.1677
Rep6 -0.06029 0.69634 -0.087 0.9315
Rep7 0.53971 0.69634 0.775 0.4432
Rep8 1.03286 0.70675 1.461 0.1523
Rep9 1.47943 0.71381 2.073 0.0452 *
Rep10 1.52600 0.72207 2.113 0.0414 *
speed_m1 -0.23286 0.15103 -1.542 0.1316
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.092 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.3978, Adjusted R-squared: 0.2025
F-statistic: 2.037 on 12 and 37 DF, p-value: 0.0486
car::Anova(m1_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 14.088 2 5.9110 0.00592 **
Rep 15.973 9 1.4893 0.18820
speed_m1 2.833 1 2.3772 0.13163
Residuals 44.093 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m1_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "YES"))
summary(m1_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-2.8015 -1.1626 -0.3335 1.1958 2.9824
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.591e+00 9.727e-01 4.720 4.2e-05 ***
poly(Generation, 2)1 5.606e+00 3.155e+00 1.777 0.08480 .
poly(Generation, 2)2 -4.831e+00 2.924e+00 -1.652 0.10799
Rep2 -9.053e-01 1.129e+00 -0.802 0.42827
Rep4 -7.159e-01 1.132e+00 -0.632 0.53153
Rep5 -1.053e-01 1.129e+00 -0.093 0.92623
Rep6 -2.106e-01 1.130e+00 -0.186 0.85329
Rep7 -6.212e-01 1.135e+00 -0.547 0.58790
Rep8 -1.105e+00 1.129e+00 -0.979 0.33459
Rep9 8.513e-16 1.128e+00 0.000 1.00000
Rep10 -2.212e-01 1.135e+00 -0.195 0.84668
speed_m1 -5.266e-01 1.566e-01 -3.363 0.00196 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.784 on 33 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.3432, Adjusted R-squared: 0.1242
F-statistic: 1.567 on 11 and 33 DF, p-value: 0.1549
car::Anova(m1_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 11.668 2 1.8329 0.175855
Rep 6.826 8 0.2681 0.971930
speed_m1 35.996 1 11.3097 0.001965 **
Residuals 105.032 33
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m1_0YES <- lm(speed ~ Generation + Rep + speed_m1,
data = filter(RIL_spread, Gap == "0p", Treatment == "YES"))
summary(m1_0YES)
Call:
lm(formula = speed ~ Generation + Rep + speed_m1, data = filter(RIL_spread,
Gap == "0p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-2.3642 -1.2625 -0.3097 1.4350 3.2240
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.297e+00 1.159e+00 3.708 0.00074 ***
Generation 1.923e-01 2.037e-01 0.944 0.35188
Rep2 -9.097e-01 1.157e+00 -0.786 0.43718
Rep4 -7.292e-01 1.161e+00 -0.628 0.53400
Rep5 -1.097e-01 1.157e+00 -0.095 0.92500
Rep6 -2.195e-01 1.158e+00 -0.189 0.85085
Rep7 -6.390e-01 1.164e+00 -0.549 0.58654
Rep8 -1.110e+00 1.157e+00 -0.959 0.34430
Rep9 1.824e-16 1.157e+00 0.000 1.00000
Rep10 -2.390e-01 1.164e+00 -0.205 0.83852
speed_m1 -5.487e-01 1.599e-01 -3.431 0.00159 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.829 on 34 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.2889, Adjusted R-squared: 0.07969
F-statistic: 1.381 on 10 and 34 DF, p-value: 0.2305
car::Anova(m1_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
Generation 2.980 1 0.8909 0.351884
Rep 6.878 8 0.2570 0.975431
speed_m1 39.380 1 11.7738 0.001595 **
Residuals 113.720 34
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So we get that the evolution treatment has negative autocorrelation but no time effect, whereas the no-evolution treatment has a quadratic time effect but no autocorrelation.
Let’s look at the rest of the landscapes.
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "1p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "1p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.71489 -0.58274 0.01871 0.55243 2.84783
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.267502 0.575288 2.203 0.03388 *
poly(Generation, 2)1 5.782657 2.092024 2.764 0.00884 **
poly(Generation, 2)2 0.001457 2.059857 0.001 0.99944
Rep2 0.686016 0.773317 0.887 0.38075
Rep3 1.229024 0.785643 1.564 0.12625
Rep4 0.143008 0.765825 0.187 0.85289
Rep5 0.143008 0.765825 0.187 0.85289
Rep6 -0.256992 0.765825 -0.336 0.73909
Rep7 1.229024 0.785643 1.564 0.12625
Rep8 1.086016 0.773317 1.404 0.16855
Rep9 0.686016 0.773317 0.887 0.38075
Rep10 -1.086016 0.773317 -1.404 0.16855
speed_m1 -0.357520 0.154986 -2.307 0.02677 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.207 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.4012, Adjusted R-squared: 0.207
F-statistic: 2.066 on 12 and 37 DF, p-value: 0.0454
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 18.403 2 6.3170 0.004363 **
Rep 20.014 9 1.5267 0.174941
speed_m1 7.751 1 5.3213 0.026767 *
Residuals 53.895 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "1p", Treatment == "YES"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "1p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-2.7791 -0.9350 -0.1574 0.8140 3.1582
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.99237 0.69218 1.434 0.1601
poly(Generation, 2)1 -1.26659 2.60077 -0.487 0.6291
poly(Generation, 2)2 2.01957 2.48615 0.812 0.4218
Rep2 1.42446 0.97112 1.467 0.1509
Rep3 0.54964 0.95827 0.574 0.5697
Rep4 0.07482 0.95049 0.079 0.9377
Rep5 0.14964 0.95827 0.156 0.8768
Rep6 1.34964 0.95827 1.408 0.1674
Rep7 1.64891 1.03772 1.589 0.1206
Rep8 2.37409 1.01111 2.348 0.0243 *
Rep9 0.87482 0.95049 0.920 0.3633
Rep10 0.54964 0.95827 0.574 0.5697
speed_m1 -0.18705 0.17598 -1.063 0.2947
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.499 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2339, Adjusted R-squared: -0.01459
F-statistic: 0.9413 on 12 and 37 DF, p-value: 0.5184
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 1.487 2 0.3311 0.7202
Rep 22.920 9 1.1338 0.3646
speed_m1 2.537 1 1.1297 0.2947
Residuals 83.108 37
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "2p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "2p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-2.3582 -0.9472 -0.0689 0.4769 4.5809
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.9925 0.7068 1.404 0.1686
poly(Generation, 2)1 -6.3086 2.6583 -2.373 0.0229 *
poly(Generation, 2)2 4.4654 2.6698 1.673 0.1029
Rep2 -0.6000 0.9690 -0.619 0.5396
Rep3 1.5067 1.0179 1.480 0.1473
Rep4 1.4045 0.9910 1.417 0.1648
Rep5 -0.6000 0.9690 -0.619 0.5396
Rep6 0.1022 0.9745 0.105 0.9170
Rep7 0.7022 0.9745 0.721 0.4757
Rep8 0.8045 0.9910 0.812 0.4221
Rep9 0.8045 0.9910 0.812 0.4221
Rep10 -0.6000 0.9690 -0.619 0.5396
speed_m1 -0.1704 0.1731 -0.984 0.3314
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.532 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.3146, Adjusted R-squared: 0.09233
F-statistic: 1.415 on 12 and 37 DF, p-value: 0.2027
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 13.360 2 2.8457 0.07086 .
Rep 24.258 9 1.1483 0.35548
speed_m1 2.274 1 0.9687 0.33139
Residuals 86.852 37
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "2p", Treatment == "YES"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "2p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-2.7756 -1.6122 -0.0111 1.1509 4.0250
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.145e+00 9.297e-01 1.231 0.2268
poly(Generation, 2)1 2.677e+00 3.742e+00 0.716 0.4793
poly(Generation, 2)2 -9.610e+00 3.614e+00 -2.660 0.0120 *
Rep2 5.398e-16 1.266e+00 0.000 1.0000
Rep3 8.006e-01 1.270e+00 0.630 0.5329
Rep4 -8.006e-01 1.270e+00 -0.630 0.5329
Rep5 8.006e-01 1.270e+00 0.630 0.5329
Rep7 -8.006e-01 1.270e+00 -0.630 0.5329
Rep8 1.601e+00 1.282e+00 1.249 0.2206
Rep9 -5.473e-16 1.266e+00 0.000 1.0000
Rep10 8.006e-01 1.270e+00 0.630 0.5329
speed_m1 -3.343e-01 1.678e-01 -1.992 0.0547 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.002 on 33 degrees of freedom
(9 observations deleted due to missingness)
Multiple R-squared: 0.3065, Adjusted R-squared: 0.07535
F-statistic: 1.326 on 11 and 33 DF, p-value: 0.2543
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 38.775 2 4.8352 0.01440 *
Rep 23.282 8 0.7258 0.66772
speed_m1 15.911 1 3.9683 0.05469 .
Residuals 132.317 33
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
m2_0NO <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "3p", Treatment == "NO"))
summary(m2_0NO)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "3p", Treatment == "NO"))
Residuals:
Min 1Q Median 3Q Max
-1.9572 -0.6343 -0.1966 0.2101 3.2495
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.520e-01 6.044e-01 1.575 0.124
poly(Generation, 2)1 -1.301e+00 2.274e+00 -0.572 0.571
poly(Generation, 2)2 2.980e+00 2.165e+00 1.376 0.177
Rep2 1.606e-01 8.432e-01 0.190 0.850
Rep3 -8.000e-01 8.297e-01 -0.964 0.341
Rep4 -8.000e-01 8.297e-01 -0.964 0.341
Rep5 -8.000e-01 8.297e-01 -0.964 0.341
Rep6 9.606e-01 8.432e-01 1.139 0.262
Rep7 -8.000e-01 8.297e-01 -0.964 0.341
Rep8 1.606e-01 8.432e-01 0.190 0.850
Rep9 -1.216e-16 8.297e-01 0.000 1.000
Rep10 -6.394e-01 8.432e-01 -0.758 0.453
speed_m1 -2.008e-01 1.879e-01 -1.068 0.292
Residual standard error: 1.312 on 37 degrees of freedom
(10 observations deleted due to missingness)
Multiple R-squared: 0.2462, Adjusted R-squared: 0.001712
F-statistic: 1.007 on 12 and 37 DF, p-value: 0.4621
car::Anova(m2_0NO)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 3.518 2 1.0221 0.3698
Rep 15.658 9 1.0109 0.4491
speed_m1 1.964 1 1.1411 0.2923
Residuals 63.682 37
m2_0YES <- lm(speed ~ poly(Generation, 2) + Rep + speed_m1,
data = filter(RIL_spread, Gap == "3p", Treatment == "YES"))
summary(m2_0YES)
Call:
lm(formula = speed ~ poly(Generation, 2) + Rep + speed_m1, data = filter(RIL_spread,
Gap == "3p", Treatment == "YES"))
Residuals:
Min 1Q Median 3Q Max
-3.2641 -1.1870 -0.2878 1.0908 5.1321
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.744e+00 9.449e-01 3.962 0.000443 ***
poly(Generation, 2)1 4.688e+00 3.527e+00 1.329 0.194157
poly(Generation, 2)2 -4.498e+00 3.219e+00 -1.398 0.172836
Rep2 -1.944e+00 1.251e+00 -1.554 0.131072
Rep3 -1.372e+00 1.235e+00 -1.111 0.275663
Rep4 -1.373e-15 1.229e+00 0.000 1.000000
Rep7 -5.718e-01 1.235e+00 -0.463 0.646748
Rep8 -1.372e+00 1.235e+00 -1.111 0.275663
Rep9 -2.744e+00 1.251e+00 -2.193 0.036444 *
Rep10 -2.744e+00 1.251e+00 -2.193 0.036444 *
speed_m1 -7.147e-01 1.447e-01 -4.940 3e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.944 on 29 degrees of freedom
(8 observations deleted due to missingness)
Multiple R-squared: 0.503, Adjusted R-squared: 0.3316
F-statistic: 2.935 on 10 and 29 DF, p-value: 0.01141
car::Anova(m2_0YES)
Anova Table (Type II tests)
Response: speed
Sum Sq Df F value Pr(>F)
poly(Generation, 2) 8.499 2 1.1251 0.3384
Rep 39.412 7 1.4906 0.2097
speed_m1 92.175 1 24.4029 2.998e-05 ***
Residuals 109.539 29
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
For evolution YES, we have:
For evolution NO, we have:
Well, so what? This will require a bit of puzzling.
One thing to notice: the per-generation variance in spread is, on average, higher in the evolution treatment across all landscapes. However, this translates to a higher variance in cumulative variance only in landscapes 1p and 3p (and I’m guessing the CI on 3p is large). 1p is the landscape with no significant autocorrelation in the evolution treatment. However, 3p has a large negative AC.
OK, here’s the part I’ve been dreading: reconstructing the models from almost 2 years ago…
Well, the first thing I found noodling around is some models of Ler DD fecundity in LerSims. But then I got distracted thinking about sources of stochasticity. First, we actually have estimates of demographic stochasticity in the RIL density experiments, as siliques are counted on multiple plants per pot. See fecundity_variability. I can’t remember whether any of the silique counts in the populations actually account for them by individual—I suspect not.
Second, there are two aspects of “environmental stochasticity.” The first is location of the pot within the greenhouse, which could affect both temperature/light and exposure to pests. In this case, we would expect nearby pots to have similar residuals. We don’t have a map of where pots were, but we do know that pots in the same runway were close to one another—thus we’d expect, within a given generation, less variability among pots in a runway than overall across all runways.
However, another source is inconsistency in the watering treatment, which affects both the total number of seeds released per silique (effective fecundity) as well as the dispersal kernel (in addition to creating heterogeneous disperpersal, this will create extra variabiity in the number of seeds staying in the home pot, which is our measure of fecundity in many cases). This could easily vary from pot to pot within a runway.
I’ve loaded in the silique count data from ~Arabidopsis/analysis/*_siliques.csv. I’m not actually sure how “clean” these are; I had to add the gap treatments to generation zero, and in the process found one error in generation 1; I didn’t look at the other generations.
The script is in data/popLer_sil.R and the data frame is popLer_sil.
Let’s look at silique production in treatment B as a function of replicate (here labeled “ID”) and generation. If my speculations last week hold, then ID:Gen should be significant, but probably not the main effect of ID.
In order to have enough replication to get at the interaction, I’ll look only at the continuous runways and not at generation zero.
sil_data <- subset(popLer_sil, Treatment == "B" & Generation > 0 & Gap == "0p")
summary(aov(Siliques ~ Gen * ID, data = sil_data))
Df Sum Sq Mean Sq F value Pr(>F)
Gen 4 116793 29198 16.377 4.22e-12 ***
ID 10 65216 6522 3.658 0.00013 ***
Gen:ID 37 201238 5439 3.051 8.07e-08 ***
Residuals 288 513482 1783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
So I’m not sure how to interpret the main effect of ID. Let’s look at the coefficients of a regression to see.
summary(lm(Siliques ~ Gen * ID, data = sil_data))
Call:
lm(formula = Siliques ~ Gen * ID, data = sil_data)
Residuals:
Min 1Q Median 3Q Max
-131.600 -21.557 -1.404 18.703 157.000
Coefficients: (3 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 120.0000 29.8573 4.019 7.46e-05 ***
Gen2 -58.5000 36.5676 -1.600 0.1107
Gen3 -53.1111 33.0085 -1.609 0.1087
Gen4 -2.6923 32.0719 -0.084 0.9332
Gen5 -6.8125 31.6685 -0.215 0.8298
ID84 49.5000 42.2247 1.172 0.2420
ID88 -38.0000 38.5457 -0.986 0.3250
ID91 13.0000 42.2247 0.308 0.7584
ID92 -86.1875 43.5242 -1.980 0.0486 *
ID94 -22.5000 42.2247 -0.533 0.5945
ID96 -40.0000 51.7144 -0.773 0.4399
ID99 -14.0000 42.2247 -0.332 0.7405
ID103 -18.7500 36.5676 -0.513 0.6085
ID107 -3.3333 38.5457 -0.086 0.9311
ID108 -5.0000 42.2247 -0.118 0.9058
Gen2:ID84 11.7500 51.7144 0.227 0.8204
Gen3:ID84 -46.0556 47.7303 -0.965 0.3354
Gen4:ID84 -123.4744 47.0874 -2.622 0.0092 **
Gen5:ID84 -93.0875 45.5263 -2.045 0.0418 *
Gen2:ID88 70.1000 47.8339 1.465 0.1439
Gen3:ID88 34.1111 44.0292 0.775 0.4391
Gen4:ID88 7.2637 43.3315 0.168 0.8670
Gen5:ID88 81.4125 42.1366 1.932 0.0543 .
Gen2:ID91 6.8333 50.2574 0.136 0.8919
Gen3:ID91 -8.3333 46.6811 -0.179 0.8584
Gen4:ID91 -48.5804 45.6306 -1.065 0.2879
Gen5:ID91 -1.1875 45.1988 -0.026 0.9791
Gen2:ID92 NA NA NA NA
Gen3:ID92 NA NA NA NA
Gen4:ID92 26.8798 61.7610 0.435 0.6637
Gen5:ID92 NA NA NA NA
Gen2:ID94 38.6000 50.8452 0.759 0.4484
Gen3:ID94 2.8111 48.3488 0.058 0.9537
Gen4:ID94 -13.3077 47.0874 -0.283 0.7777
Gen5:ID94 116.3125 45.7434 2.543 0.0115 *
Gen2:ID96 59.8333 60.9461 0.982 0.3271
Gen3:ID96 42.9111 56.8249 0.755 0.4508
Gen4:ID96 -20.8077 55.7556 -0.373 0.7093
Gen5:ID96 41.9554 55.1409 0.761 0.4474
Gen2:ID99 52.5000 50.8452 1.033 0.3027
Gen3:ID99 1.2361 46.9456 0.026 0.9790
Gen4:ID99 -55.4327 46.2919 -1.197 0.2321
Gen5:ID99 -35.9653 45.7434 -0.786 0.4324
Gen2:ID103 59.0500 46.2548 1.277 0.2028
Gen3:ID103 29.9722 41.6341 0.720 0.4722
Gen4:ID103 -10.8077 40.2854 -0.268 0.7887
Gen5:ID103 -23.5089 39.6986 -0.592 0.5542
Gen2:ID107 38.8333 48.7568 0.796 0.4264
Gen3:ID107 -0.7556 45.1714 -0.017 0.9867
Gen4:ID107 -29.3744 44.4916 -0.660 0.5096
Gen5:ID107 -40.8542 43.0338 -0.949 0.3432
Gen2:ID108 51.6667 50.2574 1.028 0.3048
Gen3:ID108 10.5556 46.6811 0.226 0.8213
Gen4:ID108 -45.0769 45.3566 -0.994 0.3211
Gen5:ID108 28.6125 44.8689 0.638 0.5242
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 42.22 on 288 degrees of freedom
Multiple R-squared: 0.4274, Adjusted R-squared: 0.326
F-statistic: 4.215 on 51 and 288 DF, p-value: 3.016e-15
anova(lm(Siliques ~ Gen * ID, data = sil_data))
Analysis of Variance Table
Response: Siliques
Df Sum Sq Mean Sq F value Pr(>F)
Gen 4 116793 29198.3 16.3766 4.222e-12 ***
ID 10 65216 6521.6 3.6578 0.0001295 ***
Gen:ID 37 201238 5438.9 3.0505 8.074e-08 ***
Residuals 288 513482 1782.9
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Oh, I see - aov still uses the default contrasts.
## Set orthogonal contrasts.
op <- options(contrasts = c("contr.helmert", "contr.poly"))
summary(aov(Siliques ~ Gen * ID, data = sil_data))
Df Sum Sq Mean Sq F value Pr(>F)
Gen 4 116793 29198 16.377 4.22e-12 ***
ID 10 65216 6522 3.658 0.00013 ***
Gen:ID 37 201238 5439 3.051 8.07e-08 ***
Residuals 288 513482 1783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
options(op) # reset to previous
Nope, that didn’t do it either.
To get around the confusing results last time, let’s just create a new factor variable that is Gen:ID, so that each combo is treated independently.
sil_data <- subset(popLer_sil, Treatment == "B" & Generation > 0 & Gap == "0p")
sil_data$GenID <- with(sil_data, interaction(Gen, ID))
summary(aov(Siliques ~ Gen + GenID, data = sil_data))
Df Sum Sq Mean Sq F value Pr(>F)
Gen 4 116793 29198 16.38 4.22e-12 ***
GenID 47 266454 5669 3.18 1.28e-09 ***
Residuals 288 513482 1783
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
OK, that’s better clarity. After accounting for inter-generational variation, most of the remaining variation is among runways rather than within runways.
ggplot(aes(Gen, Siliques, group = ID, color = ID), data = sil_data) + geom_point() +
geom_smooth(aes(Generation, Siliques), span = 1, se = FALSE)
`geom_smooth()` using method = 'loess'
Let’s look at how the residual variance scales with the mean.
sil_stats <- group_by(sil_data, Gen, ID) %>%
summarise(Mean = mean(Siliques),
Var = var(Siliques),
SD = sqrt(Var))
sil_stats <- sil_stats[complete.cases(sil_stats), ]
ggplot(aes(Mean, Var, color = ID), data = sil_stats) + geom_point()
ggplot(aes(Mean, SD, color = ID), data = sil_stats) + geom_point()
ggplot(aes(Mean, Var), data = sil_stats) + geom_point() + geom_smooth(method = "lm")
ggplot(aes(Mean, SD), data = sil_stats) + geom_point() + geom_smooth(method = "lm")
The relationship betwen the SD and mean looks a bit more linear, although there’s still a lot of scatter.
Now let’s see if the seed:silique ratio is structured in any way, or is independent among pots.
sil_data <- subset(popLer_sil, Treatment == "B" & Gap == "3p")
sil_data$Generation <- sil_data$Generation + 1
sil_data$Gen <- as.factor(sil_data$Generation)
popLer$ID <- as.factor(popLer$ID)
sil_seed_data <- left_join(sil_data, popLer)
Joining, by = c("Generation", "ID", "Treatment", "Gap", "Pot", "Gen")
Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
factors with different levels, coercing to character vector
Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
factors with different levels, coercing to character vector
Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
factor and character vector, coercing into character vector
ggplot(aes(Siliques, Seedlings, color = Gen), data = sil_seed_data) +
geom_point() + geom_smooth(method = "lm")
Warning: Removed 2 rows containing non-finite values (stat_smooth).
Warning: Removed 2 rows containing missing values (geom_point).
Within each generation, the number of seedlings in the home pot is independent of the number of siliques!!!
Now include the dispersing seeds in gen 1.
sil_data <- subset(popLer_sil, Treatment == "B" & Gap == "0p" & Gen == "0")
sil_data$Generation <- sil_data$Generation + 1
sil_data$Gen <- as.factor(sil_data$Generation)
Ler_seed_gen1$ID <- as.factor(Ler_seed_gen1$ID)
sil_seed_data <- left_join(sil_data, Ler_seed_gen1)
Joining, by = "ID"
Warning in left_join_impl(x, y, by$x, by$y, suffix$x, suffix$y): joining
factors with different levels, coercing to character vector
ggplot(aes(Siliques, eff_sd_no), data = sil_seed_data) + geom_point() +
geom_smooth(method = "lm")
ggplot(aes(Siliques, home), data = sil_seed_data) + geom_point() +
geom_smooth(method = "lm")
ggplot(aes(Siliques, stay_prop), data = sil_seed_data) + geom_point() +
geom_smooth(method = "lm")
So the lack of relationship between siliques and seedling is not because of variability in dispersal.
Let’s look at population structure in the variability of home pot seedlings, using the 1p gaps to get replication within populations but seedling numbers mostly from local production:
seed_data <- subset(popLer, Treatment == "B" & Generation > 1 & Gap == "1p")
seed_data$GenID <- with(seed_data, interaction(Gen, ID))
summary(aov(Seedlings ~ Gen + GenID, data = seed_data))
Df Sum Sq Mean Sq F value Pr(>F)
Gen 4 31628 7907 5.747 0.000388 ***
GenID 45 178349 3963 2.881 1.41e-05 ***
Residuals 84 115567 1376
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(aes(Gen, Seedlings, group = ID, color = ID), data = seed_data) + geom_point() +
geom_smooth(span = 1, se = FALSE)
`geom_smooth()` using method = 'loess'
Let’s look at how the residual variance scales with the mean.
sil_stats <- group_by(seed_data, Gen, ID) %>%
summarise(Mean = mean(Seedlings),
Var = var(Seedlings),
SD = sqrt(Var))
sil_stats <- sil_stats[complete.cases(sil_stats), ]
ggplot(aes(Mean, Var, color = ID), data = sil_stats) + geom_point()
ggplot(aes(Mean, SD, color = ID), data = sil_stats) + geom_point()
ggplot(aes(Mean, Var), data = sil_stats) + geom_point() + geom_smooth(method = "lm")
ggplot(aes(Mean, SD), data = sil_stats) + geom_point() + geom_smooth(method = "lm")
So, despite the lack of relationship between siliques and seedlings, we do see that seedling production is structured by runway, and that the SD scales roughly linearly with the mean (althoug not in a way that bives a constant CV)
We can also look at population-level structure in the density-dependent populations.
seed_data <- group_by(popLer, ID, Pot) %>%
mutate(Nm1 = lag(Seedlings))
#seed_data$Nm1 <- lag(popLer$Seedlings)
seed_data <- subset(seed_data, Treatment == "C" & Generation > 1 & Gap == "1p")
seed_data$GenID <- with(seed_data, interaction(Gen, ID))
anova(lm(log(Seedlings/Nm1) ~ log(Nm1) + Gen + GenID, data = seed_data))
Analysis of Variance Table
Response: log(Seedlings/Nm1)
Df Sum Sq Mean Sq F value Pr(>F)
log(Nm1) 1 359.27 359.27 750.8986 < 2.2e-16 ***
Gen 4 38.89 9.72 20.3221 1.105e-12 ***
GenID 45 33.35 0.74 1.5489 0.0328 *
Residuals 115 55.02 0.48
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ggplot(aes(log(Nm1), log(Seedlings/Nm1), group = GenID, color = Gen), data = seed_data) +
geom_point() + geom_smooth(method = "lm", se = FALSE)
Warning: Removed 36 rows containing non-finite values (stat_smooth).
Warning: Removed 36 rows containing missing values (geom_point).
So the population-level variation is still here.
Some observations on stochasticity in Ler seed production:
Some things that I still need to do:
Let’s start by just adding treatment B into the density dependence analysis. In order to get a good sample size we’ll use both landscapes 1p and 2p.
seed_data <- group_by(popLer, ID, Pot) %>%
mutate(Nm1 = 1 + (Treatment == "C") * (lag(Seedlings) - 1))
#seed_data$Nm1 <- lag(popLer$Seedlings)
seed_data <- subset(seed_data,
Treatment %in% c("B", "C") & Generation > 1 & Gap %in% c("1p", "2p"))
seed_data$GenID <- with(seed_data, interaction(Gen, ID))
seed_data$Nm1[is.na(seed_data$Nm1)] <- 1
seed_data$ID <- as.factor(seed_data$ID)
DD.lm <- lm(log(Seedlings/Nm1) ~ log(Nm1) + Gen * ID, data = seed_data)
anova(DD.lm)
Analysis of Variance Table
Response: log(Seedlings/Nm1)
Df Sum Sq Mean Sq F value Pr(>F)
log(Nm1) 1 1197.23 1197.23 792.6968 < 2.2e-16 ***
Gen 4 85.65 21.41 14.1779 9.618e-11 ***
ID 39 56.53 1.45 0.9597 0.5426
Gen:ID 156 196.33 1.26 0.8333 0.9043
Residuals 354 534.66 1.51
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(DD.lm)
Call:
lm(formula = log(Seedlings/Nm1) ~ log(Nm1) + Gen * ID, data = seed_data)
Residuals:
Min 1Q Median 3Q Max
-3.4725 -0.4679 0.0000 0.6050 2.6645
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.78599 0.87309 2.046 0.0415 *
log(Nm1) -0.43660 0.03108 -14.049 <2e-16 ***
Gen3 1.30529 1.22910 1.062 0.2890
Gen4 1.20304 1.12203 1.072 0.2844
Gen5 0.15684 1.12384 0.140 0.8891
Gen6 0.65412 1.06459 0.614 0.5393
ID2 0.89140 1.50799 0.591 0.5548
ID3 2.43909 1.50644 1.619 0.1063
ID8 0.82874 1.12189 0.739 0.4606
ID10 1.07306 1.22904 0.873 0.3832
ID12 1.55370 1.06446 1.460 0.1453
ID14 0.70985 1.22899 0.578 0.5639
ID16 -0.03284 1.22898 -0.027 0.9787
ID20 1.17331 1.12207 1.046 0.2964
ID24 2.58161 1.50694 1.713 0.0876 .
ID30 1.49876 1.50685 0.995 0.3206
ID37 0.94599 1.22899 0.770 0.4420
ID43 0.91479 1.02872 0.889 0.3745
ID45 1.42381 1.50777 0.944 0.3457
ID58 0.81804 1.12246 0.729 0.4666
ID64 0.37098 1.22907 0.302 0.7630
ID65 1.12223 1.50696 0.745 0.4569
ID68 0.77756 1.12198 0.693 0.4887
ID69 0.78757 1.22896 0.641 0.5220
ID72 1.00936 1.06444 0.948 0.3436
ID73 1.50985 1.50752 1.002 0.3172
ID74 2.16526 1.50752 1.436 0.1518
ID75 2.02068 1.50752 1.340 0.1810
ID78 3.28919 1.50752 2.182 0.0298 *
ID80 3.00980 1.50752 1.997 0.0466 *
ID83 0.12833 1.23185 0.104 0.9171
ID85 0.63815 1.12505 0.567 0.5709
ID87 0.61191 1.50752 0.406 0.6851
ID89 2.78873 1.50752 1.850 0.0652 .
ID90 1.55632 1.23185 1.263 0.2073
ID92 0.85246 1.12505 0.758 0.4491
ID97 0.73894 1.23185 0.600 0.5490
ID98 1.46316 1.23185 1.188 0.2357
ID100 -0.03773 1.23185 -0.031 0.9756
ID101 -0.90630 1.12505 -0.806 0.4210
ID104 2.90279 1.23185 2.356 0.0190 *
ID105 2.88684 1.50752 1.915 0.0563 .
ID106 0.08285 1.23185 0.067 0.9464
ID110 1.10439 1.50752 0.733 0.4643
ID112 0.46555 1.12505 0.414 0.6793
Gen3:ID2 -2.16031 1.94593 -1.110 0.2677
Gen4:ID2 -1.65507 1.81315 -0.913 0.3620
Gen5:ID2 -0.43102 1.78112 -0.242 0.8089
Gen6:ID2 -1.32984 1.73986 -0.764 0.4452
Gen3:ID3 -3.38632 1.94402 -1.742 0.0824 .
Gen4:ID3 -1.98500 1.81076 -1.096 0.2737
Gen5:ID3 -1.38528 1.77649 -0.780 0.4360
Gen6:ID3 -2.98502 1.73805 -1.717 0.0868 .
Gen3:ID8 -1.33328 1.58682 -0.840 0.4013
Gen4:ID8 -1.11609 1.50610 -0.741 0.4592
Gen5:ID8 0.74749 1.50533 0.497 0.6198
Gen6:ID8 -2.72659 1.46619 -1.860 0.0638 .
Gen3:ID10 -1.07110 1.73845 -0.616 0.5382
Gen4:ID10 -1.78773 1.58675 -1.127 0.2606
Gen5:ID10 1.28495 1.58657 0.810 0.4185
Gen6:ID10 -1.29988 1.55062 -0.838 0.4024
Gen3:ID12 -1.21135 1.50620 -0.804 0.4218
Gen4:ID12 -1.90996 1.39370 -1.370 0.1714
Gen5:ID12 -0.50411 1.37402 -0.367 0.7139
Gen6:ID12 -2.52657 1.32983 -1.900 0.0583 .
Gen3:ID14 -0.70204 1.73810 -0.404 0.6865
Gen4:ID14 -0.07674 1.54663 -0.050 0.9605
Gen5:ID14 0.50740 1.54640 0.328 0.7430
Gen6:ID14 -1.09967 1.48022 -0.743 0.4580
Gen3:ID16 0.74469 1.73801 0.428 0.6686
Gen4:ID16 -0.28106 1.54655 -0.182 0.8559
Gen5:ID16 1.21610 1.54660 0.786 0.4322
Gen6:ID16 0.22965 1.50619 0.152 0.8789
Gen3:ID20 -0.97470 1.54646 -0.630 0.5289
Gen4:ID20 -1.13200 1.46463 -0.773 0.4401
Gen5:ID20 0.36067 1.41910 0.254 0.7995
Gen6:ID20 -1.78062 1.36232 -1.307 0.1920
Gen3:ID24 -2.53815 2.12916 -1.192 0.2340
Gen4:ID24 -3.53938 2.06919 -1.711 0.0880 .
Gen5:ID24 -3.06753 1.88171 -1.630 0.1040
Gen6:ID24 -1.83600 1.84533 -0.995 0.3204
Gen3:ID30 -1.77556 2.12867 -0.834 0.4048
Gen4:ID30 -3.66532 1.87884 -1.951 0.0519 .
Gen5:ID30 0.13060 1.88262 0.069 0.9447
Gen6:ID30 -0.96734 1.84359 -0.525 0.6001
Gen3:ID37 -2.06866 1.73832 -1.190 0.2348
Gen4:ID37 -0.97963 1.58658 -0.617 0.5373
Gen5:ID37 0.30129 1.54681 0.195 0.8457
Gen6:ID37 -1.37315 1.48010 -0.928 0.3542
Gen3:ID43 -0.44298 1.45474 -0.305 0.7609
Gen4:ID43 -0.70395 1.36878 -0.514 0.6074
Gen5:ID43 0.30094 1.36712 0.220 0.8259
Gen6:ID43 -2.30305 1.32457 -1.739 0.0830 .
Gen3:ID45 -2.05370 1.88148 -1.092 0.2758
Gen4:ID45 -0.45029 1.81065 -0.249 0.8037
Gen5:ID45 -1.26464 1.77661 -0.712 0.4770
Gen6:ID45 -2.32672 1.71811 -1.354 0.1765
Gen3:ID58 0.12857 1.58711 0.081 0.9355
Gen4:ID58 -0.88546 1.50992 -0.586 0.5580
Gen5:ID58 0.49525 1.50735 0.329 0.7427
Gen6:ID58 -1.54672 1.46931 -1.053 0.2932
Gen3:ID64 -0.63984 1.66409 -0.385 0.7008
Gen4:ID64 -0.58730 1.58711 -0.370 0.7116
Gen5:ID64 1.15752 1.52230 0.760 0.4475
Gen6:ID64 -0.78092 1.48146 -0.527 0.5984
Gen3:ID65 -0.57220 2.12861 -0.269 0.7882
Gen4:ID65 -1.71717 2.06898 -0.830 0.4071
Gen5:ID65 -0.15346 2.06885 -0.074 0.9409
Gen6:ID65 -1.07533 2.03807 -0.528 0.5981
Gen3:ID68 -1.29369 1.58687 -0.815 0.4155
Gen4:ID68 -0.91634 1.50632 -0.608 0.5434
Gen5:ID68 0.01161 1.46276 0.008 0.9937
Gen6:ID68 -1.04929 1.42049 -0.739 0.4606
Gen3:ID69 -0.42327 1.66409 -0.254 0.7994
Gen4:ID69 -1.54499 1.58753 -0.973 0.3311
Gen5:ID69 1.04075 1.54669 0.673 0.5015
Gen6:ID69 -1.03891 1.48042 -0.702 0.4833
Gen3:ID72 -1.00478 1.46275 -0.687 0.4926
Gen4:ID72 -1.22906 1.37521 -0.894 0.3721
Gen5:ID72 -0.46668 1.36089 -0.343 0.7319
Gen6:ID72 -2.07704 1.31545 -1.579 0.1152
Gen3:ID73 -0.64989 2.12869 -0.305 0.7603
Gen4:ID73 -1.60851 2.06872 -0.778 0.4374
Gen5:ID73 2.13831 2.06970 1.033 0.3022
Gen6:ID73 -1.03255 1.84359 -0.560 0.5758
Gen3:ID74 -1.88924 2.12869 -0.888 0.3754
Gen4:ID74 -1.28657 1.87735 -0.685 0.4936
Gen5:ID74 0.39603 1.87843 0.211 0.8331
Gen6:ID74 -1.88424 1.77401 -1.062 0.2889
Gen3:ID75 -0.75525 2.12869 -0.355 0.7230
Gen4:ID75 -0.67897 2.06872 -0.328 0.7429
Gen5:ID75 -0.96777 2.06970 -0.468 0.6404
Gen6:ID75 -2.32245 1.84359 -1.260 0.2086
Gen3:ID78 -3.54725 2.12869 -1.666 0.0965 .
Gen4:ID78 -3.88032 2.06872 -1.876 0.0615 .
Gen5:ID78 -3.18899 1.81019 -1.762 0.0790 .
Gen6:ID78 -2.15341 1.77401 -1.214 0.2256
Gen3:ID80 -3.85118 1.94324 -1.982 0.0483 *
Gen4:ID80 -1.92144 1.87735 -1.023 0.3068
Gen5:ID80 -1.06104 1.81019 -0.586 0.5581
Gen6:ID80 -1.60978 1.77401 -0.907 0.3648
Gen3:ID83 0.66062 1.73810 0.380 0.7041
Gen4:ID83 1.41472 1.66411 0.850 0.3958
Gen5:ID83 1.00308 1.66533 0.602 0.5473
Gen6:ID83 -0.16410 1.54659 -0.106 0.9156
Gen3:ID85 -0.03199 1.58668 -0.020 0.9839
Gen4:ID85 0.56258 1.50527 0.374 0.7088
Gen5:ID85 0.33365 1.50662 0.221 0.8249
Gen6:ID85 -0.01573 1.46295 -0.011 0.9914
Gen3:ID87 0.01038 2.12869 0.005 0.9961
Gen4:ID87 -0.65650 2.06872 -0.317 0.7512
Gen5:ID87 -0.53321 1.87843 -0.284 0.7767
Gen6:ID87 -1.26026 1.84359 -0.684 0.4947
Gen3:ID89 -2.02986 2.12869 -0.954 0.3410
Gen4:ID89 -1.92760 2.06872 -0.932 0.3521
Gen5:ID89 -1.78711 2.06970 -0.863 0.3885
Gen6:ID89 -2.18431 2.03814 -1.072 0.2846
Gen3:ID90 -1.08191 1.66412 -0.650 0.5160
Gen4:ID90 -1.99026 1.54651 -1.287 0.1990
Gen5:ID90 0.58267 1.52323 0.383 0.7023
Gen6:ID90 -1.52391 1.46295 -1.042 0.2983
Gen3:ID92 -0.59698 1.58668 -0.376 0.7070
Gen4:ID92 -0.41567 1.50527 -0.276 0.7826
Gen5:ID92 2.20921 1.46425 1.509 0.1323
Gen6:ID92 -0.78100 1.39243 -0.561 0.5752
Gen3:ID97 -0.52622 1.73810 -0.303 0.7623
Gen4:ID97 0.43335 1.66411 0.260 0.7947
Gen5:ID97 0.94320 1.66533 0.566 0.5715
Gen6:ID97 0.27881 1.62594 0.171 0.8639
Gen3:ID98 -0.90303 1.66412 -0.543 0.5877
Gen4:ID98 -1.95525 1.54651 -1.264 0.2070
Gen5:ID98 -1.10618 1.54782 -0.715 0.4753
Gen6:ID98 -1.95746 1.50536 -1.300 0.1943
Gen3:ID100 0.84551 1.73810 0.486 0.6269
Gen4:ID100 0.64140 1.66411 0.385 0.7002
Gen5:ID100 1.58727 1.66533 0.953 0.3412
Gen6:ID100 0.69073 1.62594 0.425 0.6712
Gen3:ID101 1.86031 1.58668 1.172 0.2418
Gen4:ID101 1.93286 1.50527 1.284 0.2000
Gen5:ID101 2.29557 1.50662 1.524 0.1285
Gen6:ID101 1.82846 1.46295 1.250 0.2122
Gen3:ID104 -1.85765 1.73810 -1.069 0.2859
Gen4:ID104 -3.17278 1.66411 -1.907 0.0574 .
Gen5:ID104 -1.99215 1.54782 -1.287 0.1989
Gen6:ID104 -2.98978 1.50536 -1.986 0.0478 *
Gen3:ID105 -1.02229 2.12869 -0.480 0.6313
Gen4:ID105 -2.09168 2.06872 -1.011 0.3127
Gen5:ID105 -0.34103 2.06970 -0.165 0.8692
Gen6:ID105 -0.98315 2.03814 -0.482 0.6298
Gen3:ID106 -0.41737 1.66412 -0.251 0.8021
Gen4:ID106 0.54158 1.58668 0.341 0.7331
Gen5:ID106 1.44681 1.54782 0.935 0.3506
Gen6:ID106 -0.34875 1.50536 -0.232 0.8169
Gen3:ID110 -0.66930 2.12869 -0.314 0.7534
Gen4:ID110 -0.61371 1.87735 -0.327 0.7439
Gen5:ID110 0.68361 1.87843 0.364 0.7161
Gen6:ID110 -0.87094 1.84359 -0.472 0.6369
Gen3:ID112 -0.09026 1.58668 -0.057 0.9547
Gen4:ID112 -0.06853 1.50527 -0.046 0.9637
Gen5:ID112 1.11058 1.46425 0.758 0.4487
Gen6:ID112 -0.39053 1.41929 -0.275 0.7834
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.229 on 354 degrees of freedom
Multiple R-squared: 0.7418, Adjusted R-squared: 0.5959
F-statistic: 5.084 on 200 and 354 DF, p-value: < 2.2e-16
ggplot(aes(log(Nm1), log(Seedlings/Nm1), group = GenID, color = Gen), data = seed_data) +
geom_point() + geom_smooth(method = "lm", se = FALSE)
All right, now let’s try with glm and quasi-poisson:
DD.glm <- glm(Seedlings ~ log(Nm1) + Gen * ID, data = seed_data, family = quasipoisson)
car::Anova(DD.glm)
Analysis of Deviance Table (Type II tests)
Response: Seedlings
LR Chisq Df Pr(>Chisq)
log(Nm1) 431.29 1 < 2.2e-16 ***
Gen 192.69 4 < 2.2e-16 ***
ID 78.50 39 0.0001811 ***
Gen:ID 223.80 156 0.0003043 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(DD.glm)
Call:
glm(formula = Seedlings ~ log(Nm1) + Gen * ID, family = quasipoisson,
data = seed_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-25.346 -5.058 0.000 2.938 31.110
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.043179 0.451232 8.960 < 2e-16 ***
log(Nm1) 0.320337 0.017984 17.812 < 2e-16 ***
Gen3 0.508582 0.544455 0.934 0.350883
Gen4 0.553890 0.509055 1.088 0.277302
Gen5 -0.190163 0.539657 -0.352 0.724765
Gen6 -0.182980 0.540954 -0.338 0.735373
ID2 0.017309 0.637472 0.027 0.978353
ID3 1.328772 0.523153 2.540 0.011514 *
ID8 0.247817 0.550162 0.450 0.652666
ID10 0.114426 0.648556 0.176 0.860056
ID12 0.318970 0.538649 0.592 0.554118
ID14 -0.098951 0.618344 -0.160 0.872953
ID16 0.072229 0.597492 0.121 0.903850
ID20 -0.183721 0.628544 -0.292 0.770231
ID24 1.557990 0.501370 3.107 0.002040 **
ID30 0.461067 0.602790 0.765 0.444848
ID37 0.350891 0.598379 0.586 0.557979
ID43 0.396803 0.528499 0.751 0.453265
ID45 0.520991 0.573021 0.909 0.363863
ID58 0.065428 0.623718 0.105 0.916514
ID64 0.419852 0.607499 0.691 0.489946
ID65 0.101626 0.655837 0.155 0.876944
ID68 -0.108707 0.616558 -0.176 0.860150
ID69 -0.160725 0.664430 -0.242 0.808998
ID72 0.275853 0.537597 0.513 0.608186
ID73 -0.747343 1.728171 -0.432 0.665680
ID74 -0.091936 1.283982 -0.072 0.942959
ID75 -0.236517 1.368718 -0.173 0.862906
ID78 1.031994 0.820509 1.258 0.209312
ID80 0.752611 0.908076 0.829 0.407778
ID83 -1.517451 1.791424 -0.847 0.397532
ID85 -0.251443 0.876600 -0.287 0.774404
ID87 -1.645284 2.652267 -0.620 0.535439
ID89 0.531531 0.989065 0.537 0.591323
ID90 0.369619 0.811782 0.455 0.649160
ID92 -0.379618 0.919689 -0.413 0.680027
ID97 -0.728993 1.252908 -0.582 0.561044
ID98 -0.225467 1.014555 -0.222 0.824261
ID100 -2.097269 2.360245 -0.889 0.374832
ID101 -2.369203 2.213498 -1.070 0.285194
ID104 0.812749 0.704261 1.154 0.249260
ID105 0.629649 0.951763 0.662 0.508683
ID106 -0.975127 1.396775 -0.698 0.485556
ID110 -1.152808 2.092381 -0.551 0.582012
ID112 -1.014657 1.189767 -0.853 0.394335
Gen3:ID2 -0.725818 0.847518 -0.856 0.392353
Gen4:ID2 0.188788 0.733663 0.257 0.797080
Gen5:ID2 0.048529 0.777082 0.062 0.950240
Gen6:ID2 -0.418015 0.800906 -0.522 0.602047
Gen3:ID3 -1.626290 0.692556 -2.348 0.019412 *
Gen4:ID3 -1.126329 0.626939 -1.797 0.073259 .
Gen5:ID3 -0.842500 0.648192 -1.300 0.194526
Gen6:ID3 -1.350227 0.662340 -2.039 0.042236 *
Gen3:ID8 -0.366896 0.682695 -0.537 0.591314
Gen4:ID8 -0.934527 0.673860 -1.387 0.166367
Gen5:ID8 0.763515 0.655545 1.165 0.244923
Gen6:ID8 -1.804249 0.837205 -2.155 0.031829 *
Gen3:ID10 -0.419608 0.803372 -0.522 0.601782
Gen4:ID10 -0.094649 0.735702 -0.129 0.897706
Gen5:ID10 1.060966 0.737561 1.438 0.151182
Gen6:ID10 0.020512 0.755992 0.027 0.978369
Gen3:ID12 -0.378844 0.658266 -0.576 0.565307
Gen4:ID12 -0.958024 0.635016 -1.509 0.132277
Gen5:ID12 0.357942 0.641027 0.558 0.576933
Gen6:ID12 -1.262101 0.701813 -1.798 0.072974 .
Gen3:ID14 0.121812 0.754102 0.162 0.871766
Gen4:ID14 -0.512795 0.736078 -0.697 0.486474
Gen5:ID14 0.780078 0.717626 1.087 0.277765
Gen6:ID14 -0.045252 0.735141 -0.062 0.950952
Gen3:ID16 -0.061494 0.741496 -0.083 0.933952
Gen4:ID16 -1.202151 0.759309 -1.583 0.114265
Gen5:ID16 0.342530 0.717455 0.477 0.633355
Gen6:ID16 0.100265 0.712884 0.141 0.888229
Gen3:ID20 -0.151832 0.760273 -0.200 0.841824
Gen4:ID20 0.257355 0.700666 0.367 0.713614
Gen5:ID20 1.222094 0.712657 1.715 0.087249 .
Gen6:ID20 0.079012 0.729305 0.108 0.913788
Gen3:ID24 -1.218430 0.638428 -1.908 0.057137 .
Gen4:ID24 -2.372220 0.695935 -3.409 0.000728 ***
Gen5:ID24 -1.147753 0.690829 -1.661 0.097516 .
Gen6:ID24 -1.393997 0.709023 -1.966 0.050070 .
Gen3:ID30 -0.712891 0.779117 -0.915 0.360815
Gen4:ID30 -2.012935 0.922025 -2.183 0.029680 *
Gen5:ID30 0.041165 0.783026 0.053 0.958103
Gen6:ID30 -0.181481 0.756519 -0.240 0.810554
Gen3:ID37 -1.148642 0.807422 -1.423 0.155732
Gen4:ID37 -0.211171 0.697251 -0.303 0.762172
Gen5:ID37 0.222909 0.709411 0.314 0.753541
Gen6:ID37 -0.468397 0.726377 -0.645 0.519449
Gen3:ID43 -0.394548 0.645279 -0.611 0.541301
Gen4:ID43 -0.235858 0.602237 -0.392 0.695563
Gen5:ID43 0.460569 0.625507 0.736 0.462028
Gen6:ID43 -1.422365 0.692022 -2.055 0.040576 *
Gen3:ID45 -1.286955 0.777725 -1.655 0.098857 .
Gen4:ID45 -0.415576 0.672081 -0.618 0.536748
Gen5:ID45 -0.282137 0.693140 -0.407 0.684223
Gen6:ID45 -1.031474 0.734838 -1.404 0.161292
Gen3:ID58 0.312379 0.732627 0.426 0.670088
Gen4:ID58 -0.298223 0.705544 -0.423 0.672782
Gen5:ID58 0.897457 0.714474 1.256 0.209905
Gen6:ID58 -0.269180 0.748447 -0.360 0.719322
Gen3:ID64 -0.025388 0.725404 -0.035 0.972101
Gen4:ID64 -1.448796 0.752154 -1.926 0.054880 .
Gen5:ID64 -0.005864 0.723116 -0.008 0.993534
Gen6:ID64 -0.960080 0.750582 -1.279 0.201695
Gen3:ID65 0.391479 0.778362 0.503 0.615310
Gen4:ID65 -0.628793 0.796096 -0.790 0.430147
Gen5:ID65 0.483003 0.801865 0.602 0.547327
Gen6:ID65 0.092653 0.831419 0.111 0.911331
Gen3:ID68 -0.650878 0.787310 -0.827 0.408957
Gen4:ID68 -0.517148 0.735852 -0.703 0.482650
Gen5:ID68 0.511472 0.728950 0.702 0.483355
Gen6:ID68 -0.089564 0.748860 -0.120 0.904867
Gen3:ID69 0.678626 0.769246 0.882 0.378269
Gen4:ID69 -1.151739 0.812405 -1.418 0.157161
Gen5:ID69 0.913132 0.763121 1.197 0.232273
Gen6:ID69 0.041416 0.775749 0.053 0.957452
Gen3:ID72 -0.107015 0.650276 -0.165 0.869378
Gen4:ID72 -1.128138 0.636728 -1.772 0.077292 .
Gen5:ID72 0.059203 0.644784 0.092 0.926895
Gen6:ID72 -1.197553 0.694305 -1.725 0.085432 .
Gen3:ID73 0.146825 2.127063 0.069 0.945007
Gen4:ID73 -0.959355 2.686363 -0.357 0.721214
Gen5:ID73 2.485313 1.831548 1.357 0.175663
Gen6:ID73 1.347287 1.873472 0.719 0.472530
Gen3:ID74 -1.092530 2.081458 -0.525 0.599990
Gen4:ID74 -0.462082 1.537290 -0.301 0.763909
Gen5:ID74 0.744269 1.466755 0.507 0.612172
Gen6:ID74 -0.412529 1.615968 -0.255 0.798653
Gen3:ID75 0.041464 1.711591 0.024 0.980686
Gen4:ID75 -0.029819 1.708098 -0.017 0.986081
Gen5:ID75 -0.620767 2.391236 -0.260 0.795322
Gen6:ID75 -1.225788 2.319023 -0.529 0.597428
Gen3:ID78 -2.750542 2.277292 -1.208 0.227926
Gen4:ID78 -3.231168 2.749486 -1.175 0.240709
Gen5:ID78 -1.872749 1.411594 -1.327 0.185466
Gen6:ID78 -1.308675 1.207476 -1.084 0.279186
Gen3:ID80 -2.147372 1.586228 -1.354 0.176677
Gen4:ID80 -1.272143 1.231628 -1.033 0.302358
Gen5:ID80 0.145085 1.062433 0.137 0.891457
Gen6:ID80 -0.720991 1.193641 -0.604 0.546213
Gen3:ID83 0.847253 2.019075 0.420 0.675014
Gen4:ID83 1.695294 1.892558 0.896 0.370985
Gen5:ID83 0.755477 2.237085 0.338 0.735785
Gen6:ID83 0.977479 2.050248 0.477 0.633826
Gen3:ID85 -0.539119 1.201553 -0.449 0.653934
Gen4:ID85 -0.111520 1.089472 -0.102 0.918527
Gen5:ID85 -0.330739 1.344317 -0.246 0.805804
Gen6:ID85 -0.532982 1.418961 -0.376 0.707428
Gen3:ID87 0.807095 2.993330 0.270 0.787601
Gen4:ID87 -0.007346 3.323370 -0.002 0.998238
Gen5:ID87 1.159563 2.901308 0.400 0.689641
Gen6:ID87 -0.343113 3.592151 -0.096 0.923958
Gen3:ID89 -1.233145 1.633951 -0.755 0.450930
Gen4:ID89 -1.278453 1.622498 -0.788 0.431251
Gen5:ID89 -1.440109 2.240671 -0.643 0.520826
Gen6:ID89 -1.347209 2.155311 -0.625 0.532331
Gen3:ID90 -1.232501 1.173873 -1.050 0.294460
Gen4:ID90 -2.327631 1.433981 -1.623 0.105438
Gen5:ID90 -0.063752 0.990668 -0.064 0.948725
Gen6:ID90 -1.339447 1.201562 -1.115 0.265712
Gen3:ID92 -0.782119 1.335260 -0.586 0.558421
Gen4:ID92 -0.710894 1.285595 -0.553 0.580634
Gen5:ID92 1.566795 1.027148 1.525 0.128056
Gen6:ID92 -0.261706 1.239341 -0.211 0.832880
Gen3:ID97 -0.267420 1.654094 -0.162 0.871657
Gen4:ID97 0.380419 1.470381 0.259 0.796000
Gen5:ID97 0.637177 1.590949 0.401 0.689029
Gen6:ID97 0.365301 1.672508 0.218 0.827232
Gen3:ID98 -0.662733 1.328294 -0.499 0.618135
Gen4:ID98 -1.269260 1.388927 -0.914 0.361422
Gen5:ID98 -0.737178 1.469882 -0.502 0.616316
Gen6:ID98 -1.129207 1.628438 -0.693 0.488495
Gen3:ID100 1.457531 2.532780 0.575 0.565341
Gen4:ID100 1.111118 2.577147 0.431 0.666627
Gen5:ID100 2.321790 2.509014 0.925 0.355399
Gen6:ID100 2.009830 2.554105 0.787 0.431866
Gen3:ID101 1.900613 2.326914 0.817 0.414595
Gen4:ID101 1.815185 2.322609 0.782 0.435014
Gen5:ID101 1.883482 2.418857 0.779 0.436697
Gen6:ID101 2.146589 2.376492 0.903 0.367002
Gen3:ID104 -1.167309 1.074136 -1.087 0.277890
Gen4:ID104 -2.606458 1.681809 -1.550 0.122084
Gen5:ID104 -1.139406 1.065882 -1.069 0.285807
Gen6:ID104 -2.033892 1.388089 -1.465 0.143741
Gen3:ID105 -0.225584 1.236054 -0.183 0.855292
Gen4:ID105 -1.442529 1.633741 -0.883 0.377857
Gen5:ID105 0.005970 1.355634 0.004 0.996489
Gen6:ID105 -0.146044 1.403823 -0.104 0.917202
Gen3:ID106 -0.220900 1.708075 -0.129 0.897173
Gen4:ID106 0.006833 1.634461 0.004 0.996667
Gen5:ID106 1.619138 1.499301 1.080 0.280909
Gen6:ID106 -0.144233 1.803312 -0.080 0.936296
Gen3:ID110 0.127407 2.584739 0.049 0.960714
Gen4:ID110 0.405886 2.287563 0.177 0.859271
Gen5:ID110 1.083981 2.306413 0.470 0.638655
Gen6:ID110 0.313033 2.509932 0.125 0.900818
Gen3:ID112 0.117874 1.468375 0.080 0.936064
Gen4:ID112 -0.027064 1.478746 -0.018 0.985408
Gen5:ID112 1.032842 1.376415 0.750 0.453521
Gen6:ID112 0.267973 1.529522 0.175 0.861022
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 75.14003)
Null deviance: 163903 on 554 degrees of freedom
Residual deviance: 27271 on 354 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
Let’s look at the residuals:
seed_data$Fitted <- fitted(DD.glm)
seed_data <- mutate(seed_data,
resid2 = ((Seedlings/Nm1) - (Fitted/Nm1))^2)
ggplot(aes(1/Nm1, resid2), data = seed_data) + geom_point() + scale_y_log10() +
geom_smooth()
`geom_smooth()` using method = 'loess'
summary(lm(resid2 ~ I(1/Nm1), data = seed_data))
Call:
lm(formula = resid2 ~ I(1/Nm1), data = seed_data)
Residuals:
Min 1Q Median 3Q Max
-2824 -2546 -56 5 272077
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -11.63 777.21 -0.015 0.9881
I(1/Nm1) 2835.73 1063.60 2.666 0.0079 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12130 on 553 degrees of freedom
Multiple R-squared: 0.01269, Adjusted R-squared: 0.01091
F-statistic: 7.108 on 1 and 553 DF, p-value: 0.007896
The regression has an intercept of zero, suggesting that none of the among-pot variation is due to “environmental stochastisity.” However, the log plot reveals a bunch of cases with a residual of zero; inspection of the dataset reveals that these are cases with only a single pot (i.e., singular values of GenID). I need to figure out how to drop those from the dataset before the analysis!
GenID_counts <- table(seed_data$GenID)
singletons <- rownames(GenID_counts)[GenID_counts == 1]
seed_data <- droplevels(seed_data[-match(singletons, seed_data$GenID), ])
DD.lm <- lm(log(Seedlings/Nm1) ~ log(Nm1) + Gen * ID, data = seed_data)
anova(DD.lm)
Analysis of Variance Table
Response: log(Seedlings/Nm1)
Df Sum Sq Mean Sq F value Pr(>F)
log(Nm1) 1 1072.27 1072.27 709.9570 < 2.2e-16 ***
Gen 4 88.87 22.22 14.7102 3.97e-11 ***
ID 36 42.11 1.17 0.7745 0.8234
Gen:ID 117 163.81 1.40 0.9270 0.6820
Residuals 354 534.66 1.51
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(DD.lm)
Call:
lm(formula = log(Seedlings/Nm1) ~ log(Nm1) + Gen * ID, data = seed_data)
Residuals:
Min 1Q Median 3Q Max
-3.4725 -0.5346 0.0608 0.6502 2.6645
Coefficients: (27 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.78599 0.87309 2.046 0.0415 *
log(Nm1) -0.43660 0.03108 -14.049 <2e-16 ***
Gen3 1.30529 1.22910 1.062 0.2890
Gen4 1.20304 1.12203 1.072 0.2844
Gen5 0.15684 1.12384 0.140 0.8891
Gen6 0.65412 1.06459 0.614 0.5393
ID2 -0.43844 0.86909 -0.504 0.6142
ID3 -0.54593 0.87041 -0.627 0.5309
ID8 0.82874 1.12189 0.739 0.4606
ID10 1.07306 1.22904 0.873 0.3832
ID12 1.55370 1.06446 1.460 0.1453
ID14 0.70985 1.22899 0.578 0.5639
ID16 -0.03284 1.22898 -0.027 0.9787
ID20 1.17331 1.12207 1.046 0.2964
ID24 0.74561 1.06436 0.701 0.4841
ID30 0.53141 1.06535 0.499 0.6182
ID37 0.94599 1.22899 0.770 0.4420
ID43 0.91479 1.02872 0.889 0.3745
ID45 -0.90291 0.82443 -1.095 0.2742
ID58 0.81804 1.12246 0.729 0.4666
ID64 0.37098 1.22907 0.302 0.7630
ID68 0.77756 1.12198 0.693 0.4887
ID69 0.78757 1.22896 0.641 0.5220
ID72 1.00936 1.06444 0.948 0.3436
ID73 0.47730 1.06987 0.446 0.6558
ID74 0.28102 0.94494 0.297 0.7663
ID75 -0.30177 1.06987 -0.282 0.7781
ID78 1.13578 0.94494 1.202 0.2302
ID80 1.40002 0.94494 1.482 0.1393
ID83 0.12833 1.23185 0.104 0.9171
ID85 0.63815 1.12505 0.567 0.5709
ID87 -0.64835 1.06987 -0.606 0.5449
ID90 1.55632 1.23185 1.263 0.2073
ID92 0.85246 1.12505 0.758 0.4491
ID97 0.73894 1.23185 0.600 0.5490
ID98 1.46316 1.23185 1.188 0.2357
ID100 -0.03773 1.23185 -0.031 0.9756
ID101 -0.90630 1.12505 -0.806 0.4210
ID104 2.90279 1.23185 2.356 0.0190 *
ID106 0.08285 1.23185 0.067 0.9464
ID110 0.23345 1.06987 0.218 0.8274
ID112 0.46555 1.12505 0.414 0.6793
Gen3:ID2 -0.83047 1.50534 -0.552 0.5815
Gen4:ID2 -0.32523 1.32811 -0.245 0.8067
Gen5:ID2 0.89882 1.28167 0.701 0.4836
Gen6:ID2 NA NA NA NA
Gen3:ID3 -0.40130 1.50585 -0.266 0.7900
Gen4:ID3 1.00002 1.32914 0.752 0.4523
Gen5:ID3 1.59974 1.28191 1.248 0.2129
Gen6:ID3 NA NA NA NA
Gen3:ID8 -1.33328 1.58682 -0.840 0.4013
Gen4:ID8 -1.11609 1.50610 -0.741 0.4592
Gen5:ID8 0.74749 1.50533 0.497 0.6198
Gen6:ID8 -2.72659 1.46619 -1.860 0.0638 .
Gen3:ID10 -1.07110 1.73845 -0.616 0.5382
Gen4:ID10 -1.78773 1.58675 -1.127 0.2606
Gen5:ID10 1.28495 1.58657 0.810 0.4185
Gen6:ID10 -1.29988 1.55062 -0.838 0.4024
Gen3:ID12 -1.21135 1.50620 -0.804 0.4218
Gen4:ID12 -1.90996 1.39370 -1.370 0.1714
Gen5:ID12 -0.50411 1.37402 -0.367 0.7139
Gen6:ID12 -2.52657 1.32983 -1.900 0.0583 .
Gen3:ID14 -0.70204 1.73810 -0.404 0.6865
Gen4:ID14 -0.07674 1.54663 -0.050 0.9605
Gen5:ID14 0.50740 1.54640 0.328 0.7430
Gen6:ID14 -1.09967 1.48022 -0.743 0.4580
Gen3:ID16 0.74469 1.73801 0.428 0.6686
Gen4:ID16 -0.28106 1.54655 -0.182 0.8559
Gen5:ID16 1.21610 1.54660 0.786 0.4322
Gen6:ID16 0.22965 1.50619 0.152 0.8789
Gen3:ID20 -0.97470 1.54646 -0.630 0.5289
Gen4:ID20 -1.13200 1.46463 -0.773 0.4401
Gen5:ID20 0.36067 1.41910 0.254 0.7995
Gen6:ID20 -1.78062 1.36232 -1.307 0.1920
Gen3:ID24 NA NA NA NA
Gen4:ID24 NA NA NA NA
Gen5:ID24 -1.23153 1.54708 -0.796 0.4265
Gen6:ID24 NA NA NA NA
Gen3:ID30 NA NA NA NA
Gen4:ID30 -2.69798 1.54730 -1.744 0.0821 .
Gen5:ID30 1.09795 1.55087 0.708 0.4794
Gen6:ID30 NA NA NA NA
Gen3:ID37 -2.06866 1.73832 -1.190 0.2348
Gen4:ID37 -0.97963 1.58658 -0.617 0.5373
Gen5:ID37 0.30129 1.54681 0.195 0.8457
Gen6:ID37 -1.37315 1.48010 -0.928 0.3542
Gen3:ID43 -0.44298 1.45474 -0.305 0.7609
Gen4:ID43 -0.70395 1.36878 -0.514 0.6074
Gen5:ID43 0.30094 1.36712 0.220 0.8259
Gen6:ID43 -2.30305 1.32457 -1.739 0.0830 .
Gen3:ID45 0.27303 1.39290 0.196 0.8447
Gen4:ID45 1.87643 1.29867 1.445 0.1494
Gen5:ID45 1.06209 1.24938 0.850 0.3959
Gen6:ID45 NA NA NA NA
Gen3:ID58 0.12857 1.58711 0.081 0.9355
Gen4:ID58 -0.88546 1.50992 -0.586 0.5580
Gen5:ID58 0.49525 1.50735 0.329 0.7427
Gen6:ID58 -1.54672 1.46931 -1.053 0.2932
Gen3:ID64 -0.63984 1.66409 -0.385 0.7008
Gen4:ID64 -0.58730 1.58711 -0.370 0.7116
Gen5:ID64 1.15752 1.52230 0.760 0.4475
Gen6:ID64 -0.78092 1.48146 -0.527 0.5984
Gen3:ID68 -1.29369 1.58687 -0.815 0.4155
Gen4:ID68 -0.91634 1.50632 -0.608 0.5434
Gen5:ID68 0.01161 1.46276 0.008 0.9937
Gen6:ID68 -1.04929 1.42049 -0.739 0.4606
Gen3:ID69 -0.42327 1.66409 -0.254 0.7994
Gen4:ID69 -1.54499 1.58753 -0.973 0.3311
Gen5:ID69 1.04075 1.54669 0.673 0.5015
Gen6:ID69 -1.03891 1.48042 -0.702 0.4833
Gen3:ID72 -1.00478 1.46275 -0.687 0.4926
Gen4:ID72 -1.22906 1.37521 -0.894 0.3721
Gen5:ID72 -0.46668 1.36089 -0.343 0.7319
Gen6:ID72 -2.07704 1.31545 -1.579 0.1152
Gen3:ID73 NA NA NA NA
Gen4:ID73 NA NA NA NA
Gen5:ID73 NA NA NA NA
Gen6:ID73 NA NA NA NA
Gen3:ID74 NA NA NA NA
Gen4:ID74 0.59767 1.46276 0.409 0.6831
Gen5:ID74 2.28027 1.46334 1.558 0.1201
Gen6:ID74 NA NA NA NA
Gen3:ID75 NA NA NA NA
Gen4:ID75 NA NA NA NA
Gen5:ID75 NA NA NA NA
Gen6:ID75 NA NA NA NA
Gen3:ID78 NA NA NA NA
Gen4:ID78 NA NA NA NA
Gen5:ID78 -1.03559 1.37465 -0.753 0.4517
Gen6:ID78 NA NA NA NA
Gen3:ID80 -2.24140 1.54641 -1.449 0.1481
Gen4:ID80 -0.31165 1.46276 -0.213 0.8314
Gen5:ID80 0.54875 1.37465 0.399 0.6900
Gen6:ID80 NA NA NA NA
Gen3:ID83 0.66062 1.73810 0.380 0.7041
Gen4:ID83 1.41472 1.66411 0.850 0.3958
Gen5:ID83 1.00308 1.66533 0.602 0.5473
Gen6:ID83 -0.16410 1.54659 -0.106 0.9156
Gen3:ID85 -0.03199 1.58668 -0.020 0.9839
Gen4:ID85 0.56258 1.50527 0.374 0.7088
Gen5:ID85 0.33365 1.50662 0.221 0.8249
Gen6:ID85 -0.01573 1.46295 -0.011 0.9914
Gen3:ID87 NA NA NA NA
Gen4:ID87 NA NA NA NA
Gen5:ID87 0.72705 1.54696 0.470 0.6387
Gen6:ID87 NA NA NA NA
Gen3:ID90 -1.08191 1.66412 -0.650 0.5160
Gen4:ID90 -1.99026 1.54651 -1.287 0.1990
Gen5:ID90 0.58267 1.52323 0.383 0.7023
Gen6:ID90 -1.52391 1.46295 -1.042 0.2983
Gen3:ID92 -0.59698 1.58668 -0.376 0.7070
Gen4:ID92 -0.41567 1.50527 -0.276 0.7826
Gen5:ID92 2.20921 1.46425 1.509 0.1323
Gen6:ID92 -0.78100 1.39243 -0.561 0.5752
Gen3:ID97 -0.52622 1.73810 -0.303 0.7623
Gen4:ID97 0.43335 1.66411 0.260 0.7947
Gen5:ID97 0.94320 1.66533 0.566 0.5715
Gen6:ID97 0.27881 1.62594 0.171 0.8639
Gen3:ID98 -0.90303 1.66412 -0.543 0.5877
Gen4:ID98 -1.95525 1.54651 -1.264 0.2070
Gen5:ID98 -1.10618 1.54782 -0.715 0.4753
Gen6:ID98 -1.95746 1.50536 -1.300 0.1943
Gen3:ID100 0.84551 1.73810 0.486 0.6269
Gen4:ID100 0.64140 1.66411 0.385 0.7002
Gen5:ID100 1.58727 1.66533 0.953 0.3412
Gen6:ID100 0.69073 1.62594 0.425 0.6712
Gen3:ID101 1.86031 1.58668 1.172 0.2418
Gen4:ID101 1.93286 1.50527 1.284 0.2000
Gen5:ID101 2.29557 1.50662 1.524 0.1285
Gen6:ID101 1.82846 1.46295 1.250 0.2122
Gen3:ID104 -1.85765 1.73810 -1.069 0.2859
Gen4:ID104 -3.17278 1.66411 -1.907 0.0574 .
Gen5:ID104 -1.99215 1.54782 -1.287 0.1989
Gen6:ID104 -2.98978 1.50536 -1.986 0.0478 *
Gen3:ID106 -0.41737 1.66412 -0.251 0.8021
Gen4:ID106 0.54158 1.58668 0.341 0.7331
Gen5:ID106 1.44681 1.54782 0.935 0.3506
Gen6:ID106 -0.34875 1.50536 -0.232 0.8169
Gen3:ID110 NA NA NA NA
Gen4:ID110 0.25723 1.54641 0.166 0.8680
Gen5:ID110 1.55455 1.54696 1.005 0.3156
Gen6:ID110 NA NA NA NA
Gen3:ID112 -0.09026 1.58668 -0.057 0.9547
Gen4:ID112 -0.06853 1.50527 -0.046 0.9637
Gen5:ID112 1.11058 1.46425 0.758 0.4487
Gen6:ID112 -0.39053 1.41929 -0.275 0.7834
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.229 on 354 degrees of freedom
Multiple R-squared: 0.7189, Adjusted R-squared: 0.5934
F-statistic: 5.729 on 158 and 354 DF, p-value: < 2.2e-16
ggplot(aes(log(Nm1), log(Seedlings/Nm1), group = GenID, color = Gen), data = seed_data) +
geom_point() + geom_smooth(method = "lm", se = FALSE)
DD.glm <- glm(Seedlings ~ log(Nm1) + Gen * ID, data = seed_data, family = quasipoisson)
car::Anova(DD.glm)
Analysis of Deviance Table (Type II tests)
Response: Seedlings
LR Chisq Df Pr(>Chisq)
log(Nm1) 431.29 1 < 2.2e-16 ***
Gen 185.62 4 < 2.2e-16 ***
ID 67.75 36 0.0010645 **
Gen:ID 175.90 117 0.0003525 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(DD.glm)
Call:
glm(formula = Seedlings ~ log(Nm1) + Gen * ID, family = quasipoisson,
data = seed_data)
Deviance Residuals:
Min 1Q Median 3Q Max
-25.3458 -5.7571 -0.7389 3.1541 31.1101
Coefficients: (27 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.043179 0.451232 8.960 <2e-16 ***
log(Nm1) 0.320337 0.017984 17.812 <2e-16 ***
Gen3 0.508582 0.544455 0.934 0.3509
Gen4 0.553890 0.509055 1.088 0.2773
Gen5 -0.190163 0.539657 -0.352 0.7248
Gen6 -0.182980 0.540954 -0.338 0.7354
ID2 -0.400706 0.484754 -0.827 0.4090
ID3 -0.021455 0.406260 -0.053 0.9579
ID8 0.247817 0.550162 0.450 0.6527
ID10 0.114426 0.648556 0.176 0.8601
ID12 0.318970 0.538649 0.592 0.5541
ID14 -0.098951 0.618344 -0.160 0.8730
ID16 0.072229 0.597492 0.121 0.9038
ID20 -0.183721 0.628544 -0.292 0.7702
ID24 0.163993 0.501472 0.327 0.7438
ID30 0.279585 0.457183 0.612 0.5412
ID37 0.350891 0.598379 0.586 0.5580
ID43 0.396803 0.528499 0.751 0.4533
ID45 -0.510483 0.460247 -1.109 0.2681
ID58 0.065428 0.623718 0.105 0.9165
ID64 0.419852 0.607499 0.691 0.4899
ID68 -0.108707 0.616558 -0.176 0.8601
ID69 -0.160725 0.664430 -0.242 0.8090
ID72 0.275853 0.537597 0.513 0.6082
ID73 0.599945 0.733991 0.817 0.4143
ID74 -0.504465 0.989021 -0.510 0.6103
ID75 -1.462305 1.876139 -0.779 0.4363
ID78 -0.276681 0.894529 -0.309 0.7573
ID80 0.031620 0.784600 0.040 0.9679
ID83 -1.517451 1.791424 -0.847 0.3975
ID85 -0.251443 0.876600 -0.287 0.7744
ID87 -1.988398 2.425787 -0.820 0.4129
ID90 0.369619 0.811782 0.455 0.6492
ID92 -0.379618 0.919689 -0.413 0.6800
ID97 -0.728993 1.252908 -0.582 0.5610
ID98 -0.225467 1.014555 -0.222 0.8243
ID100 -2.097269 2.360245 -0.889 0.3748
ID101 -2.369203 2.213498 -1.070 0.2852
ID104 0.812749 0.704261 1.154 0.2493
ID106 -0.975127 1.396775 -0.698 0.4856
ID110 -0.839775 1.391806 -0.603 0.5466
ID112 -1.014657 1.189767 -0.853 0.3943
Gen3:ID2 -0.307803 0.739524 -0.416 0.6775
Gen4:ID2 0.606803 0.605179 1.003 0.3167
Gen5:ID2 0.466544 0.657278 0.710 0.4783
Gen6:ID2 NA NA NA NA
Gen3:ID3 -0.276063 0.608762 -0.453 0.6505
Gen4:ID3 0.223899 0.533329 0.420 0.6749
Gen5:ID3 0.507727 0.558195 0.910 0.3637
Gen6:ID3 NA NA NA NA
Gen3:ID8 -0.366896 0.682695 -0.537 0.5913
Gen4:ID8 -0.934527 0.673860 -1.387 0.1664
Gen5:ID8 0.763515 0.655545 1.165 0.2449
Gen6:ID8 -1.804249 0.837205 -2.155 0.0318 *
Gen3:ID10 -0.419608 0.803372 -0.522 0.6018
Gen4:ID10 -0.094649 0.735702 -0.129 0.8977
Gen5:ID10 1.060966 0.737561 1.438 0.1512
Gen6:ID10 0.020512 0.755992 0.027 0.9784
Gen3:ID12 -0.378844 0.658266 -0.576 0.5653
Gen4:ID12 -0.958024 0.635016 -1.509 0.1323
Gen5:ID12 0.357942 0.641027 0.558 0.5769
Gen6:ID12 -1.262101 0.701813 -1.798 0.0730 .
Gen3:ID14 0.121812 0.754102 0.162 0.8718
Gen4:ID14 -0.512795 0.736078 -0.697 0.4865
Gen5:ID14 0.780078 0.717626 1.087 0.2778
Gen6:ID14 -0.045252 0.735141 -0.062 0.9510
Gen3:ID16 -0.061494 0.741496 -0.083 0.9340
Gen4:ID16 -1.202151 0.759309 -1.583 0.1143
Gen5:ID16 0.342530 0.717455 0.477 0.6334
Gen6:ID16 0.100265 0.712884 0.141 0.8882
Gen3:ID20 -0.151832 0.760273 -0.200 0.8418
Gen4:ID20 0.257355 0.700666 0.367 0.7136
Gen5:ID20 1.222094 0.712657 1.715 0.0872 .
Gen6:ID20 0.079012 0.729305 0.108 0.9138
Gen3:ID24 NA NA NA NA
Gen4:ID24 NA NA NA NA
Gen5:ID24 0.246244 0.690904 0.356 0.7217
Gen6:ID24 NA NA NA NA
Gen3:ID30 NA NA NA NA
Gen4:ID30 -1.831454 0.834158 -2.196 0.0288 *
Gen5:ID30 0.222646 0.677205 0.329 0.7425
Gen6:ID30 NA NA NA NA
Gen3:ID37 -1.148642 0.807422 -1.423 0.1557
Gen4:ID37 -0.211171 0.697251 -0.303 0.7622
Gen5:ID37 0.222909 0.709411 0.314 0.7535
Gen6:ID37 -0.468397 0.726377 -0.645 0.5194
Gen3:ID43 -0.394548 0.645279 -0.611 0.5413
Gen4:ID43 -0.235858 0.602237 -0.392 0.6956
Gen5:ID43 0.460569 0.625507 0.736 0.4620
Gen6:ID43 -1.422365 0.692022 -2.055 0.0406 *
Gen3:ID45 -0.255481 0.698781 -0.366 0.7149
Gen4:ID45 0.615898 0.578706 1.064 0.2879
Gen5:ID45 0.749337 0.603219 1.242 0.2150
Gen6:ID45 NA NA NA NA
Gen3:ID58 0.312379 0.732627 0.426 0.6701
Gen4:ID58 -0.298223 0.705544 -0.423 0.6728
Gen5:ID58 0.897457 0.714474 1.256 0.2099
Gen6:ID58 -0.269180 0.748447 -0.360 0.7193
Gen3:ID64 -0.025388 0.725404 -0.035 0.9721
Gen4:ID64 -1.448796 0.752154 -1.926 0.0549 .
Gen5:ID64 -0.005864 0.723116 -0.008 0.9935
Gen6:ID64 -0.960080 0.750582 -1.279 0.2017
Gen3:ID68 -0.650878 0.787310 -0.827 0.4090
Gen4:ID68 -0.517148 0.735852 -0.703 0.4826
Gen5:ID68 0.511472 0.728950 0.702 0.4834
Gen6:ID68 -0.089564 0.748860 -0.120 0.9049
Gen3:ID69 0.678626 0.769246 0.882 0.3783
Gen4:ID69 -1.151739 0.812405 -1.418 0.1572
Gen5:ID69 0.913132 0.763121 1.197 0.2323
Gen6:ID69 0.041416 0.775749 0.053 0.9575
Gen3:ID72 -0.107015 0.650276 -0.165 0.8694
Gen4:ID72 -1.128138 0.636728 -1.772 0.0773 .
Gen5:ID72 0.059203 0.644784 0.092 0.9269
Gen6:ID72 -1.197553 0.694305 -1.725 0.0854 .
Gen3:ID73 NA NA NA NA
Gen4:ID73 NA NA NA NA
Gen5:ID73 NA NA NA NA
Gen6:ID73 NA NA NA NA
Gen3:ID74 NA NA NA NA
Gen4:ID74 -0.049553 1.300382 -0.038 0.9696
Gen5:ID74 1.156798 1.216100 0.951 0.3421
Gen6:ID74 NA NA NA NA
Gen3:ID75 NA NA NA NA
Gen4:ID75 NA NA NA NA
Gen5:ID75 NA NA NA NA
Gen6:ID75 NA NA NA NA
Gen3:ID78 NA NA NA NA
Gen4:ID78 NA NA NA NA
Gen5:ID78 -0.564074 1.455171 -0.388 0.6985
Gen6:ID78 NA NA NA NA
Gen3:ID80 -1.426381 1.518337 -0.939 0.3481
Gen4:ID80 -0.551152 1.142841 -0.482 0.6299
Gen5:ID80 0.866075 0.957986 0.904 0.3666
Gen6:ID80 NA NA NA NA
Gen3:ID83 0.847253 2.019075 0.420 0.6750
Gen4:ID83 1.695294 1.892558 0.896 0.3710
Gen5:ID83 0.755477 2.237085 0.338 0.7358
Gen6:ID83 0.977479 2.050248 0.477 0.6338
Gen3:ID85 -0.539119 1.201553 -0.449 0.6539
Gen4:ID85 -0.111520 1.089472 -0.102 0.9185
Gen5:ID85 -0.330739 1.344317 -0.246 0.8058
Gen6:ID85 -0.532982 1.418961 -0.376 0.7074
Gen3:ID87 NA NA NA NA
Gen4:ID87 NA NA NA NA
Gen5:ID87 1.502677 2.695457 0.557 0.5775
Gen6:ID87 NA NA NA NA
Gen3:ID90 -1.232501 1.173873 -1.050 0.2945
Gen4:ID90 -2.327631 1.433981 -1.623 0.1054
Gen5:ID90 -0.063752 0.990668 -0.064 0.9487
Gen6:ID90 -1.339447 1.201562 -1.115 0.2657
Gen3:ID92 -0.782119 1.335260 -0.586 0.5584
Gen4:ID92 -0.710894 1.285595 -0.553 0.5806
Gen5:ID92 1.566795 1.027148 1.525 0.1281
Gen6:ID92 -0.261706 1.239341 -0.211 0.8329
Gen3:ID97 -0.267420 1.654094 -0.162 0.8717
Gen4:ID97 0.380419 1.470381 0.259 0.7960
Gen5:ID97 0.637177 1.590949 0.401 0.6890
Gen6:ID97 0.365301 1.672508 0.218 0.8272
Gen3:ID98 -0.662733 1.328294 -0.499 0.6181
Gen4:ID98 -1.269260 1.388927 -0.914 0.3614
Gen5:ID98 -0.737178 1.469882 -0.502 0.6163
Gen6:ID98 -1.129207 1.628438 -0.693 0.4885
Gen3:ID100 1.457531 2.532780 0.575 0.5653
Gen4:ID100 1.111118 2.577147 0.431 0.6666
Gen5:ID100 2.321790 2.509014 0.925 0.3554
Gen6:ID100 2.009830 2.554105 0.787 0.4319
Gen3:ID101 1.900613 2.326914 0.817 0.4146
Gen4:ID101 1.815185 2.322609 0.782 0.4350
Gen5:ID101 1.883482 2.418857 0.779 0.4367
Gen6:ID101 2.146589 2.376492 0.903 0.3670
Gen3:ID104 -1.167309 1.074136 -1.087 0.2779
Gen4:ID104 -2.606458 1.681809 -1.550 0.1221
Gen5:ID104 -1.139406 1.065882 -1.069 0.2858
Gen6:ID104 -2.033892 1.388089 -1.465 0.1437
Gen3:ID106 -0.220900 1.708075 -0.129 0.8972
Gen4:ID106 0.006833 1.634461 0.004 0.9967
Gen5:ID106 1.619138 1.499301 1.080 0.2809
Gen6:ID106 -0.144233 1.803312 -0.080 0.9363
Gen3:ID110 NA NA NA NA
Gen4:ID110 0.092853 1.670387 0.056 0.9557
Gen5:ID110 0.770948 1.696050 0.455 0.6497
Gen6:ID110 NA NA NA NA
Gen3:ID112 0.117874 1.468375 0.080 0.9361
Gen4:ID112 -0.027064 1.478746 -0.018 0.9854
Gen5:ID112 1.032842 1.376415 0.750 0.4535
Gen6:ID112 0.267973 1.529522 0.175 0.8610
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 75.14003)
Null deviance: 147594 on 512 degrees of freedom
Residual deviance: 27271 on 354 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
seed_data$Fitted <- fitted(DD.glm)
seed_data <- mutate(seed_data,
resid2 = ((Seedlings/Nm1) - (Fitted/Nm1))^2)
ggplot(aes(1/Nm1, resid2), data = seed_data) + geom_point() + scale_y_log10() +
geom_smooth()
`geom_smooth()` using method = 'loess'
summary(lm(resid2 ~ I(1/Nm1), data = seed_data))
Call:
lm(formula = resid2 ~ I(1/Nm1), data = seed_data)
Residuals:
Min 1Q Median 3Q Max
-3131 -2730 -24 15 271770
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -21.57 829.56 -0.026 0.9793
I(1/Nm1) 3152.28 1149.10 2.743 0.0063 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12600 on 511 degrees of freedom
Multiple R-squared: 0.01451, Adjusted R-squared: 0.01258
F-statistic: 7.525 on 1 and 511 DF, p-value: 0.006297
This does the trick. We still have a zero intercept. There appears to be one singleton remaining. The construction of seed_data is now complex enough that it should be in a munge script. And I still need to work out how to do the random effects model!
Here’s what we “knew” on May 1:
Note that these were based on “unclean” data, but reanalysis with clean data led to the same conclusions (May 19).
Further observations:
log(Seedlings/Adults) ~ log(Adults), using 3p landscapes in both B and C), there appears to be a quadratic effect of generation, but this is largely driven by generation 6 (19 May)So far I’ve addressed 2, 3, 10, and 11 in the May 1 “remaining empirical questions.”
Here’s what I think we want to lay out for the motivation of a paper:
Mark Lewis’s new book has a whole chapter on “stochastic spread:”
Our focus is on the rate at which such processes spread spatially. We first examine the effects of environmental stochasticity on spatial spread by means of stochastic integrodifference and reaction–diffusion models. Here, we analyze both the wave solution for the expected density of individuals and the wave solution for a given realization of the stochastic process, as well as the variability that this can exhibit. Then we turn our analysis to the effects of demographic stochasticity on spatial spread. This can be described by its effects on the expected density and also by its effects on the velocity of the furthest-forward location for the population. Finally, we consider nonlinear stochastic models for patchy spread of invasive species, showing how patchiness in the leading edge of an invasion process can dramatically slow the invasive spreading speed. (p. 233)
The stochastic integrodifference model has a time varying population growth function and a time-varying dispersal kernel–but both are assumed to be spatially uniform. Here’s a useful commentary on stochastic kernels:
A dispersal kernel is a probability density function of a random variable, that random variable being the location of a dispersed offspring. In other words, a dispersal kernel is a guide, or template, prescribing how likely an offspring is to fall in a given infinitesimal location. As such, it is typically an ordinary function and is not usually a random variable itself. What, then, do we mean by referring to the kt as “random dispersal kernels”? We mean that the function kt is not known in advance but rather is defined randomly. For example, it could be that \(k_t = N(0,\nu)\) where \(\nu\) is a random variable. Thus, even the template for dispersal, and not just dispersal itself, is stochastic.
If growth and dispersal variability are uncorrelated, then the “average population” spreads at rate given by the average environment. They call this the speed of the “expectation wave.” Positive autocorrelation between growth and dispersal will speed things up.
They then turn to the “stochastic wave,” which has location \(X_t\) and “average” speed (i.e., averaged from time 0 to time \(t\)) \(\bar{C_t}\). Assuming an exponential spatial distribution at time 0, They derive values for the expectation and variance of \(\bar{C_t}\) that depends on steepness of the initial condition and the moment generating function of the “representative” kernel (they don’t really say how this is defined; it’s actually introduced in the previous analysis). They then show that, whereas \(\bar{C_t}\) converges to its expectation as \(t\) gets large, the variance of \(X_t\) increases linearly with time (p. 239; fig. 8.1). This is the “simple model” that I was going to develop; here it’s already been done for me!
One thing they note is that (at least with iid/independence between dispersal and growth) the speed of the stochastic wave will be less than the speed of the expectation wave. Thus, stochasticity slows spread, on average.
A key reference for this work seems to be Neubert, M.G., Kot, M., Lewis, M.A.: Invasion speeds in fluctuating environments. Proc. R. Soc. B 267, 1603–1610 (2000). doi:10.1098/rspb.2000.1185
In fact, fig. 8.1 above is fig 4 in Neubert et al (2000), which also has the observation about the linear growth in \(var(X_t)\) at the bottom of p. 1606. This is what we should actually cite!
They point out that it has been extended to structured populations by Schreiber, S.J., Ryan, M.E.: Invasion speeds for structured populations in fluctuating environments. Theor. Ecol. 4, 423–434 (2011). doi:10.1007/s12080-010-0098-5
They also briefly consider a stochastic reaction-diffusion equation. There, spatial autocorrelation in the temporally-varying growth environment leads to speeding up of spread.
Starts with two lab examples: Tribolium from Melbourne and Hastings (2009), and ciliates from Giometto, A., Rinaldo, A., Carrara, F., Altermatt, F.: Emerging predictable features of replicated biological invasion fronts. Proc. Natl. Acad. Sci. U. S. A. 111(1), 297–301 (2014). doi:10.1073/pnas.1321167110 [this paper uses a stochastic PDE to fit the dynamics]. They accept the authors’ respective claims that Tribolium spread “could not be predicted precisely” but ciliate spread could be “predicted well.” But the former claim refers to a single realization whereas the latter refers to the mean and variance across replicates.
Models henceforth focus on furthest-forward individual. Brief treatment of continuous-time linear branching process, which I don’t think is too relevant here. Then an extended treatment of Clark et al’s “invasion by extremes” paper. They use a 1-paremter simplification of the 2Dt kernel: \[ k(x) = \frac{1}{2\sqrt{2\beta} \left(1 + \frac{x^2}{2\beta}\right)^{3/2}}; \] not sure if this was Clark’s original choice. The section doesn’t treat variance in spread but instead focus on mean spread, with lower and upper bounds depending on assumptions about contributions from individuals behind the front.
They finish the chapter by talking about “patchy spread,” but here they mean something like stratified diffusion rather than a patchy environment. There’s probably some interesting stuff here but not needed for our current work.
Confirmed that fig. 8.1 above is fig 4 in Neubert et al (2000), which also has the observation about the linear growth in \(var(X_t)\) at the bottom of p. 1606. This is what we should actually cite!
Looked through the list of papers citign this work, and nothing new jumped out at me. In fact, it seems that none of the experimental spread papers cite this work!
Quick looks at dispersal. Lewis et al 2008 show how to calculate empirical moment generating functions, but I’m not sure how useful that is for us. Nathan et al. 2012 has a good review of kernels, but not much statistical info. I found Viana et al. 2016 which suggests fitting the CDF is better than fitting the PDF.
Viana also led my to the fitdistrplus package, which fits standard distributions. I think this is what we need–I was increasinglyl unconfident im my home-rolled fitting routines. The only rub is that I’ll need to write d, p and q functions for any “special” distributions (e.g., 2Dt).
Jenn sent info about the Ler and RIL dispersal data.
The data from the sticky paper kernels for Ler are called
2013_08_08_Exp1_Spray.csv. And I’m attaching the script I used to analyze them [FitKernels_Exp1_SPRAY_aug2013.R(inArabidopsis/analysis)].
I fit negative exponential kernels, which I know isn’t necessarily the best fit, but is what I needed for the simulations I ran (for the density dependence paper). For reasons I don’t know, the mean dispersal distance on sticky paper was further than in Generation 1. This of course assumes all seeds would have germinated, but we know mostly they do. Number of moms in control ranged from 2- 1200, but I found no significant relationship between density and dispersal distance. I used the estimate from the sticky paper data to keep m = 2 in my models (mean dispersal distance of ~0.5 pots). Otherwise, my simulated invasions moved much more slowly than they did in reality.
I then asked
I’m not quite sure how to interpret the Ler data you sent – e.g., sometimes “pot” is > 0, but most of the seeds in “pot_gen3” are in pot 0.
Jenn replied
As for the Ler data, we had started the experiment several generations earlier and then for a number of reasons decided to scrap it and start over. So we then took pots from a wide range of densities (from different places in the runways) to put next to the sticky paper. This means they weren’t all pot 0. Shouldn’t matter which pot they are (at least that was the idea). The pot numbers are just relics of this past.
You also asked about sticky paper data for the RILs. I don’t think we ever dispersed the RILs onto sticky paper, but we did have 2 trials onto empty pots (with solitary moms). Script that I used attached, which references the appropriate csv files. [
FitDispersal_RILs_30May2016.R(inArabidopsis/analysis)]
I asked
It looks like I’ve already used “2015_06_30_RilsDispersal.csv”; I’ll add the 2016 data (is there any reason not to combine them?).
Jenn replied
I don’t recall any reason not to combine the 2015 & 2016 RIL data. That said, when I ran through that script, it definitely looks like you can get a really different set of ranks depending on which data you use. I’ll look in the morning (at my paper notes) to see if there’s anything obviously different about how the data were collected.
I think that the RIL data were the ones where I identified some data entry issues when I was in Zurich. I’ll need to double check that this file has the corrected data.
Here is the new file data/disperseLer.R:
### Creates the data object disperseLer, representing the Ler dispersal experiment
# Get data from Jenn's file
data_dir <- "~/Dropbox/Arabidopsis/analysis"
disperseLer <- read.csv(file.path(data_dir, '2013_08_08_Exp1_Spray.csv'),
header = TRUE)
# Drop the "clipped" treatment
disperseLer <- droplevels(subset(disperseLer, new_trt != "clipped", drop = TRUE))
# Drop the columns with the (irrelevant) info about where the mom pots came from
disperseLer <- disperseLer[, -c(1:4, 6)]
# Clean up column names
names(disperseLer) <- c("ID", "Pot", "Distance", "Seedlings", "Siliques", "Density",
"Treatment")
# Make some factor variables
disperseLer$ID <- as.factor(disperseLer$ID)
# Clean up the workspace
rm(data_dir)
# Auto-cache the data
ProjectTemplate::cache("disperseLer")
Here is a file to summarize the data by replicate, in lib/helpers.R:
calc_dispersal_stats <- function(dispersal_data) {
dispersal_stats <- group_by(dispersal_data, ID, Density, Siliques) %>%
summarise(
Total_seeds = sum(Seedlings),
Home_seeds = Seedlings[1],
Dispersing_seeds = Total_seeds - Home_seeds,
Dispersal_fraction = Dispersing_seeds / Total_seeds,
# Dispersal stats including all seeds
Mean_dispersal_all = sum(Seedlings * (Distance - 4)) / Total_seeds,
RMS_dispersal_all = sqrt(sum(Seedlings * (Distance - 4)^2) / Total_seeds),
# Dispersal stats including just dispersing seeds
Mean_dispersal = sum(Seedlings * (Distance - 4)) / Dispersing_seeds,
RMS_dispersal = sqrt(sum(Seedlings * (Distance - 4)^2) / Dispersing_seeds)
)
return (dispersal_stats)
}
Let’s look at how the total seed number, the fraction dispersing, and the dispersal distances scale with density:
Ler_dispersal_stats <- calc_dispersal_stats(disperseLer)
qplot(Density, Total_seeds, data = Ler_dispersal_stats)
qplot(Density, Dispersal_fraction, data = Ler_dispersal_stats)
qplot(Density, Mean_dispersal_all, data = Ler_dispersal_stats)
qplot(Density, Mean_dispersal, data = Ler_dispersal_stats)
qplot(Density, RMS_dispersal_all, data = Ler_dispersal_stats)
qplot(Density, RMS_dispersal, data = Ler_dispersal_stats)
Except for total seed number, there don’t seem to be any patterns here. For the single plants, let’s look at whether silique number (which might be correlated with plant size) can predict anything:
qplot(Siliques, Total_seeds, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
qplot(Siliques, Dispersal_fraction, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
qplot(Siliques, Mean_dispersal_all, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
qplot(Siliques, Mean_dispersal, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
qplot(Siliques, RMS_dispersal_all, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
qplot(Siliques, RMS_dispersal, data = subset(Ler_dispersal_stats, !is.na(Siliques)))
Again, no patterns. This makes modeling simpler!
One more thing to look at: are the mean dispersal distances correlated with either the fraction dispersing or the number of dispersing seeds?
qplot(Dispersing_seeds, Mean_dispersal, data = Ler_dispersal_stats)
qplot(Dispersing_seeds, RMS_dispersal, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, Mean_dispersal, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, RMS_dispersal, data = Ler_dispersal_stats)
Here we see a correlation with both. This could just be a sampling effect; once we fit kernel parameters we’ll have to see if they have correlations as well. If so, that suggests that the same variables (such as canopy height and structure, and intensity of simulated rainfall) that affect whether seeds leave the home pot also affect how far they go.
Here’s a multiple regression just to tease out the two variables:
summary(lm(Mean_dispersal ~ Dispersal_fraction + Dispersing_seeds, data = Ler_dispersal_stats))
Call:
lm(formula = Mean_dispersal ~ Dispersal_fraction + Dispersing_seeds,
data = Ler_dispersal_stats)
Residuals:
Min 1Q Median 3Q Max
-1.7144 -0.5299 -0.2285 0.3843 4.4228
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.0255317 0.4449107 13.543 1.76e-15 ***
Dispersal_fraction 2.5229534 1.1204944 2.252 0.0307 *
Dispersing_seeds 0.0016311 0.0008397 1.942 0.0602 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.148 on 35 degrees of freedom
Multiple R-squared: 0.3358, Adjusted R-squared: 0.2979
F-statistic: 8.849 on 2 and 35 DF, p-value: 0.000776
summary(lm(RMS_dispersal ~ Dispersal_fraction + Dispersing_seeds, data = Ler_dispersal_stats))
Call:
lm(formula = RMS_dispersal ~ Dispersal_fraction + Dispersing_seeds,
data = Ler_dispersal_stats)
Residuals:
Min 1Q Median 3Q Max
-31.558 -11.516 -5.909 7.545 97.275
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 40.98162 9.35833 4.379 0.000103 ***
Dispersal_fraction 44.97236 23.56868 1.908 0.064603 .
Dispersing_seeds 0.03231 0.01766 1.829 0.075904 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 24.15 on 35 degrees of freedom
Multiple R-squared: 0.2861, Adjusted R-squared: 0.2453
F-statistic: 7.015 on 2 and 35 DF, p-value: 0.002743
It appears that both are playing a role. But the dispersal fraction is a stronger predictor, which suggests that there is real mechanism here, and not just sampling getting further into the kernel.
Our data are actually interval-censored, and we need to ensure that we are taking that into account. I believe that the likelihood functions I wrote previously take that into account, but they lack the rigor of fitdistrplus. Unfortunately, the data have another issue: the seeds in pot zero are a mixture of “non-dispersers” (i.e., seeds that fall straight down and aren’t part of the kernel) and “short-distance dispersers” (seeds moving according to a dispersal kernel, but traveling less than 3 cm). To account for this either requires writing a zero-inflated distribution (which will require custom functions for everything) or telling the objective function to truncate the distribution. Again I believe that my prior functions do the latter, but I’m not truly confident of that. Unfortunately, there doesn’t seem to be an obvious way to account this in fitdistrplus using out-of-the-box distributions.
Actually, I take that back—the issue of truncation is addressed in the FAQs. It does require writing new distribution functions, but they are pretty straightforward. Here is their example for a (doubly) truncated exponential distribution:
dtexp <- function(x, rate, low, upp)
{
PU <- pexp(upp, rate=rate)
PL <- pexp(low, rate=rate)
dexp(x, rate) / (PU-PL) * (x >= low) * (x <= upp)
}
ptexp <- function(q, rate, low, upp)
{
PU <- pexp(upp, rate=rate)
PL <- pexp(low, rate=rate)
(pexp(q, rate)-PL) / (PU-PL) * (q >= low) * (q <= upp) + 1 * (q > upp)
}
Let’s start by just plotting the raw data. The first plot is just the empirical kernels; the second puts the vertical axis on a log scale (so exponential distributions should look linear and normal distributions should look quadratic) and the third puts both axes on a log scale (so lognormal distributions should look quadratic).
ggplot(aes(x = Distance, y = Seedlings), data = subset(disperseLer, Distance > 4)) +
geom_line() + facet_wrap(~ID, scales = "free_y")
ggplot(aes(x = Distance, y = Seedlings), data = subset(disperseLer, Distance > 4)) +
geom_line() + facet_wrap(~ID, scales = "free_y") + scale_y_log10()
ggplot(aes(x = Distance, y = Seedlings), data = subset(disperseLer, Distance > 4)) +
geom_line() + facet_wrap(~ID, scales = "free_y") + scale_y_log10() + scale_x_log10()
In many cases, it seems like a (truncated) normal distribution might be best! Rep 87_1 is going to be particularly problematic, as it is strongly sigmoid.
I’ll start by selecting a single rep with a large number of seeds and a well-behaved distribution. Let’s use 87_0.
Pull out the data and convert into the format required by fitdistrplus:
data_87_0 <- subset(disperseLer, ID == "87_0" & Distance > 4)
data_vec <- rep(data_87_0$Distance, data_87_0$Seedlings)
cens_data_tble <- data.frame(left = data_vec - 4.5, right = data_vec - 3.5)
Now construct the distribution functions for a truncated normal. I code in the “standard” truncation level.
dtnorm <- function(x, mean, sd, low = 3.5)
{
PU <- 1
PL <- pnorm(low, mean = mean, sd = sd)
dnorm(x, mean, sd) / (PU-PL) * (x >= low)
}
ptnorm <- function(q, mean, sd, low = 3.5)
{
PU <- 1
PL <- pnorm(low, mean = mean, sd = sd)
(pnorm(q, mean, sd)-PL) / (PU-PL) * (q >= low)
}
So now let’s give it a try:
library(fitdistrplus)
Loading required package: MASS
Attaching package: 'MASS'
The following object is masked from 'package:dplyr':
select
Loading required package: survival
Loading required package: methods
fit_tnorm <- fitdistcens(cens_data_tble, "tnorm", start = list(mean = 6, sd = 10))
summary(fit_tnorm)
Fitting of the distribution ' tnorm ' By maximum likelihood on censored data
Parameters
estimate Std. Error
mean 7.681504 0.5930568
sd 5.588817 0.3661223
Fixed parameters:
data frame with 0 columns and 0 rows
Loglikelihood: -1242.553 AIC: 2489.106 BIC: 2497.343
Correlation matrix:
mean sd
mean 1.000000 -0.803976
sd -0.803976 1.000000
cdfcompcens(fit_tnorm, Turnbull.confint = TRUE)
This looks like a pretty darn good fit!
Interestingly, using the non-censored method gives almost identical parameter estimates and CDF:
fit3<-fitdist(data_vec - 4, "tnorm", start = list(mean = 6, sd = 10))
summary(fit3)
Fitting of the distribution ' tnorm ' by maximum likelihood
Parameters :
estimate Std. Error
mean 7.696363 0.5870556
sd 5.583054 0.3626565
Loglikelihood: -1242.376 AIC: 2488.752 BIC: 2496.988
Correlation matrix:
mean sd
mean 1.000000 -0.801385
sd -0.801385 1.000000
cdfcomp(fit3)
Thus, we can use some of the other diagnostic plots for non-censored data:
denscomp(fit3, breaks = 3:23+0.5, plotstyle = "ggplot", xlim = c(3, 24))
# qqcomp(fit3) # This one requires defining a quantile function!
ppcomp(fit3)
Unfortunately, we need additional work to get qqcomp working; and denscomp only gives a sensible looking histogram with tweaking (and even there, the bins are not centered appropriately, so the histogram is shifted).
Here’s a rough draft of a better figure:
xx = seq(3.3, max(fit3$data) + 0.7, 0.1)
plot(xx, (dtnorm(xx, fit3$estimate[1], fit3$estimate[2])), type = "l", col="red")
points(table(fit3$data)/length(fit3$data))
We can make analagous functions for the log normal distribution:
dtlnorm <- function(x, meanlog, sdlog, low = 3.5)
{
PU <- 1
PL <- plnorm(low, meanlog = meanlog, sdlog = sdlog)
dlnorm(x, meanlog, sdlog) / (PU-PL) * (x >= low)
}
ptlnorm <- function(q, meanlog, sdlog, low = 3.5)
{
PU <- 1
PL <- plnorm(low, meanlog = meanlog, sdlog = sdlog)
(plnorm(q, meanlog, sdlog)-PL) / (PU-PL) * (q >= low)
}
Now fit to the censored data and compare with our normal fit
fit_tlnorm <- fitdistcens(cens_data_tble, "tlnorm", start = list(meanlog = log(6), sdlog = log(10)))
summary(fit_tlnorm)
Fitting of the distribution ' tlnorm ' By maximum likelihood on censored data
Parameters
estimate Std. Error
meanlog 2.1737212 0.02373376
sdlog 0.4584419 0.01879380
Fixed parameters:
data frame with 0 columns and 0 rows
Loglikelihood: -1254.488 AIC: 2512.976 BIC: 2521.212
Correlation matrix:
meanlog sdlog
meanlog 1.0000000 -0.2594226
sdlog -0.2594226 1.0000000
cdfcompcens(list(fit_tnorm, fit_tlnorm), Turnbull.confint = TRUE,
legendtext = c("truncated normal", "truncated lognormal"))
The CDF of the log-normal is not as good, and the AIC is about 24 units worse.
Start by writing a function that takes a single replicate’s data and does the analysis
fit_dispersal_models <- function(dispersal_data) {
data_loc <- subset(dispersal_data, Distance > 4)
data_vec <- rep(data_loc$Distance, data_loc$Seedlings)
cens_data_tble <- data.frame(left = data_vec - 4.5, right = data_vec - 3.5)
fit_tnorm <- fitdistcens(cens_data_tble, "tnorm", start = list(mean = 6, sd = 10))
fit_tlnorm <- fitdistcens(cens_data_tble, "tlnorm", start = list(meanlog = log(6), sdlog = log(10)))
cdfcompcens(list(fit_tnorm, fit_tlnorm), Turnbull.confint = TRUE, main = dispersal_data$ID[1],
legendtext = c("truncated normal", "truncated lognormal"))
stnorm <- summary(fit_tnorm)
stlnorm <- summary(fit_tlnorm)
data.frame(ID = dispersal_data$ID[1], AICnorm = stnorm$aic, AIClnorm = stlnorm$aic,
mu = stnorm$est[1], sd = stnorm$est[2],
mulog = stlnorm$est[1], sdlog = stlnorm$est[2])
}
Test it out:
fit_dispersal_models(subset(disperseLer, ID == "87_0"))
ID AICnorm AIClnorm mu sd mulog sdlog
mean 87_0 2489.106 2512.976 7.681504 5.588817 2.173721 0.4584419
ddply(subset(disperseLer, ID == "87_0" | ID == "128_1"), "ID",
fit_dispersal_models)
ID AICnorm AIClnorm mu sd mulog sdlog
1 128_1 4388.528 4368.020 -41.254702 13.903769 1.723583 0.5179114
2 87_0 2489.106 2512.976 7.681504 5.588817 2.173721 0.4584419
Full run
all_Ler_fits <- ddply(disperseLer, "ID", fit_dispersal_models)
Warning in sqrt(diag(varcovar)): NaNs produced
Warning in sqrt(1/diag(V)): NaNs produced
Warning in cov2cor(varcovar): diag(.) had 0 or NA entries; non-finite
result is doubtful
all_Ler_fits
ID AICnorm AIClnorm mu sd mulog sdlog
1 100_0 210.19617 209.46451 4.0982669 2.598637 1.662753 0.3158626
2 101_0 602.77572 592.94699 5.1014741 3.277311 1.839458 0.3314258
3 104_0 3917.28412 3922.72793 -66.0205798 19.643467 1.781924 0.6310585
4 105_0 1749.42577 1745.88089 4.3716320 3.528681 1.788268 0.3682102
5 106_0 646.69449 643.17978 -207.7151988 30.384323 1.307769 0.7884596
6 107_0 785.44720 789.47637 -69.4381348 16.731328 1.359045 0.6932002
7 107_2 1063.19962 1049.11621 -70.4622184 16.881924 1.740646 0.4708113
8 108_0 3238.39521 3253.82893 5.1491554 5.733904 2.030345 0.4653378
9 109_0 986.23168 993.20962 0.3779014 6.555358 1.830760 0.5150772
10 118_0 204.70929 204.18931 5.0644220 2.761443 1.771622 0.3257241
11 119_0 159.50334 161.20702 -6.3007402 10.223254 1.877411 0.6364096
12 125_0 1535.73278 1535.21286 4.3296332 3.730306 1.802977 0.3811286
13 128_1 4388.52816 4368.01998 -41.2547017 13.903769 1.723583 0.5179114
14 131_0 238.69069 240.39449 -4.1084462 5.925933 1.515535 0.5133134
15 133_0 157.21511 155.12266 5.1511826 3.886229 1.897703 0.3621337
16 134_0 309.97239 302.59321 3.1582072 4.429201 1.839981 0.3572949
17 135_0 2047.52526 2051.75059 6.8726239 4.285294 2.044296 0.3991739
18 73_0 1532.73721 1536.47422 3.8581071 5.121355 1.901724 0.4439891
19 75_0 1268.28878 1260.59024 4.1641279 5.606950 1.984015 0.4344484
20 75_1 713.93094 712.15268 5.1754515 3.910981 1.888703 0.3829246
21 77_0 1155.49401 1152.37149 1.9264317 4.898033 1.780264 0.4203578
22 78_2 3036.48468 3056.08625 6.5758420 5.186990 2.078296 0.4447256
23 79_0 16.43216 16.74522 4.7295954 1.069369 1.567024 0.1989932
24 85_1 732.95641 721.38097 2.8431735 4.174413 1.781615 0.3625076
25 85_2 4355.97098 4360.51040 3.7176899 5.757882 1.957874 0.4585569
26 87_0 2489.10649 2512.97557 7.6815043 5.588817 2.173721 0.4584419
27 87_1 4843.91681 4931.83322 8.2138623 4.811877 2.161850 0.4406535
28 87_2 2518.89435 2497.62958 3.4870029 4.699188 1.861861 0.4004055
29 88_0 505.51054 498.61356 3.1493213 3.455050 1.709980 0.3395724
30 88_1 632.33299 627.93489 0.9579404 4.372635 1.677200 0.3862587
31 89_1 2374.42168 2376.20474 -34.9798971 15.374743 1.835881 0.6015254
32 90_0 1414.00768 1402.33081 4.8038650 4.040552 1.881511 0.3774301
33 90_1 127.58152 128.43750 -14.7687132 7.623299 1.276556 0.5866747
34 91_2 3391.73490 3426.96778 2.8737605 7.010140 1.991896 0.5304049
35 93_0 1029.93150 1033.78692 2.0282346 5.749507 1.853023 0.4719776
36 95_0 639.50220 636.32519 -4.6097825 6.728016 1.717181 0.4532293
37 98_0 1956.80573 1995.93158 12.3030722 4.651470 2.468955 0.3797439
38 99_0 1252.70855 1262.56877 2.1108985 6.230271 1.889331 0.5023501
Some AIC stats
sum(all_Ler_fits$AICnorm)
[1] 58230.28
sum(all_Ler_fits$AIClnorm)
[1] 58366.17
cbind(all_Ler_fits$ID, all_Ler_fits$AICnorm - all_Ler_fits$AIClnorm)
[,1] [,2]
[1,] 1 0.7316622
[2,] 2 9.8287266
[3,] 3 -5.4438047
[4,] 4 3.5448752
[5,] 5 3.5147126
[6,] 6 -4.0291707
[7,] 7 14.0834126
[8,] 8 -15.4337230
[9,] 9 -6.9779437
[10,] 10 0.5199870
[11,] 11 -1.7036794
[12,] 12 0.5199187
[13,] 13 20.5081874
[14,] 14 -1.7037993
[15,] 15 2.0924486
[16,] 16 7.3791782
[17,] 17 -4.2253246
[18,] 18 -3.7370105
[19,] 19 7.6985462
[20,] 20 1.7782603
[21,] 21 3.1225217
[22,] 22 -19.6015753
[23,] 23 -0.3130643
[24,] 24 11.5754364
[25,] 25 -4.5394237
[26,] 26 -23.8690776
[27,] 27 -87.9164058
[28,] 28 21.2647714
[29,] 29 6.8969805
[30,] 30 4.3980948
[31,] 31 -1.7830564
[32,] 32 11.6768719
[33,] 33 -0.8559744
[34,] 34 -35.2328843
[35,] 35 -3.8554141
[36,] 36 3.1770114
[37,] 37 -39.1258517
[38,] 38 -9.8602207
So there’s variation among reps on which model fits better. In sum, the normal distribution is better. However, the parameter estimates are much more variable, in ways that really don’t make sense biologically, unless there is directionally biased dispersal that varies from rep to rep.
Warning in .load.config(override.config): Your configuration file
is missing the following entries: data_loading_header, data_ignore,
logging_level, cache_loaded_data, sticky_variables. Defaults will be used.
Warning in .check.version(config): Your configuration is compatible with version 0.7 of the ProjectTemplate package.
Please run ProjectTemplate::migrate.project() to migrate to the installed version 0.8.
The next steps in the Ler dispersal analysis are to confirm statistical support for among-rep heterogeneity and see if the kernel parameters are correlated with the fraction dispersing.
Added the functions for the truncated distributions and to fit the models to lib/helpers.R.
The total AIC for the rep-specific fits was 58230.28 and 58366.17 for the normal and lognormal models, respectively.
I think that if I pass the whole data set to the analysis function it will combine all the data. Let’s look at this:
sum(disperseLer$Seedlings[disperseLer$Distance > 4])
[1] 11928
data_loc <- subset(disperseLer, Distance > 4)
data_vec <- rep(data_loc$Distance, data_loc$Seedlings)
length(data_vec)
[1] 11928
That’s a match!
fit_dispersal_models(disperseLer)
ID AICnorm AIClnorm mu sd mulog sdlog
mean 73_0 59772.17 59799.07 -0.7149118 7.407026 1.905282 0.50168
Ignore the ID on the graphical and text output.
What we see are good fits overall. Again, the normal is somewhat better than the lognormal (\(\Delta_{\text{AIC}} =\) 26.9). But the rep-specific fits have AIC’s that are more than 1400 AIC units lower!
Regenerate the datsets, without the plots (may want to put this into a munge script):
all_Ler_fits <- ddply(disperseLer, "ID", fit_dispersal_models, plot.it = FALSE)
Warning in sqrt(diag(varcovar)): NaNs produced
Warning in sqrt(1/diag(V)): NaNs produced
Warning in cov2cor(varcovar): diag(.) had 0 or NA entries; non-finite
result is doubtful
Ler_dispersal_stats <- calc_dispersal_stats(disperseLer)
Ler_dispersal_stats <- merge(Ler_dispersal_stats, all_Ler_fits, by = "ID")
head(Ler_dispersal_stats)
ID Density Siliques Total_seeds Home_seeds Dispersing_seeds
1 100_0 1 126 131 73 58
2 101_0 1 177 336 194 142
3 104_0 390 NA 1384 628 756
4 105_0 1227 NA 1180 762 418
5 106_0 107 NA 389 257 132
6 107_0 199 NA 946 773 173
Dispersal_fraction Mean_dispersal_all RMS_dispersal_all Mean_dispersal
1 0.4427481 2.580153 4.034205 5.827586
2 0.4226190 2.875000 4.663690 6.802817
3 0.5462428 4.593931 7.099011 8.410053
4 0.3542373 2.363559 4.197760 6.672249
5 0.3393316 2.611825 5.267156 7.696970
6 0.1828753 1.286469 3.335535 7.034682
RMS_dispersal AICnorm AIClnorm mu sd mulog
1 6.062889 210.1962 209.4645 4.098267 2.598637 1.662753
2 7.173896 602.7757 592.9470 5.101474 3.277311 1.839458
3 9.605168 3917.2841 3922.7279 -66.020580 19.643467 1.781924
4 7.052944 1749.4258 1745.8809 4.371632 3.528681 1.788268
5 9.041990 646.6945 643.1798 -207.715199 30.384323 1.307769
6 7.799881 785.4472 789.4764 -69.438135 16.731328 1.359045
sdlog se_mu se_sd se_mulog se_sdlog
1 0.3158626 1.330679703 0.634838130 0.06392084 0.04807686
2 0.3314258 0.810572636 0.441435119 0.03321615 0.02656491
3 0.6310585 NaN NaN 0.04842916 0.03103006
4 0.3682102 0.648281310 0.312409282 0.02473200 0.01922509
5 0.7884596 0.007112144 0.004535006 0.30807962 0.13543948
6 0.6932002 21.032531014 2.279795670 0.21971090 0.10060063
Now look at some plots
qplot(Dispersal_fraction, mulog, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, sdlog, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, mu, data = Ler_dispersal_stats)
qplot(Dispersal_fraction, sd, data = Ler_dispersal_stats)
There is a clear pattern with mulog, but nothing else. This is another reason to favor the lognormal distribution, as it indicates a mechanistic interconnection between the two components of the dispersal kernel. Here’s the actual relationship:
cor(Ler_dispersal_stats$mulog, Ler_dispersal_stats$Dispersal_fraction)
[1] 0.5604027
summary(lm(mulog ~ Dispersal_fraction, data = Ler_dispersal_stats))
Call:
lm(formula = mulog ~ Dispersal_fraction, data = Ler_dispersal_stats)
Residuals:
Min 1Q Median 3Q Max
-0.46413 -0.07753 0.00646 0.09439 0.49780
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.54775 0.07454 20.76 < 2e-16 ***
Dispersal_fraction 0.66056 0.16271 4.06 0.000253 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1928 on 36 degrees of freedom
Multiple R-squared: 0.3141, Adjusted R-squared: 0.295
F-statistic: 16.48 on 1 and 36 DF, p-value: 0.0002531
A simple-minded approach would be to draw a random dispersal fraction, and then use the regression (including the residuals) to get a value for mulog. But really, we should be doing draws from a multivariate distribution. The challenge, then, is taking into account the (presumably) beta distribution for the dispersal fraction. We can use Morris & Doak’s approach of simulating multivariate normals and then matching quantiles to get the beta distribution; since the latter is pretty symmetric (looks like the mean is around 0.4) it should recover the correlation coefficient pretty closely.
A bigger conceptual issue is that the points in the scatterplots above are each measured with error, in both directions (binomial sampling for the dispersal fraction, MLE standard errors in mulog). This is another reason to avoid OLS regression; but does it introduce bias into the estimate of the correlation? There may be a solution to this at https://www.rasch.org/rmt/rmt101g.htm
Third, is sdlog independent of mulog (we have already seen that it’s independent of dispersal fraction)? What about its relationship to the major and minor axes of the mulog-dispersal fraction distribution?
Finally, when we are working with the populations, do we assume that all pots in a replicate at a particular generation experience the same dispersal kernel? Or that each pot gets an independent sample of the kernel? If most of the spread is driven by seeds coming from the front pot, then it won’t matter much.
Actually, I’ve convinced myself that I don’t need to do this. We will be using the covariance rather than the correlation, and some calculations reveal that the covariance is not biased by measurement error. The disattenuation calculation is to correct for the inflated variances caused by measurement error that are used in calculating the correlation from the covariance.
Of course, I will need to get those corrected variances for population the VC matrix to use in simulation. The way to do this is to subtract the mean squared standard error from the raw variance estimates.
I’ve modified the kernel fitting routine to return the standard errors of the parameter estimates. So now for the corrections:
var(Ler_dispersal_stats$mulog)
[1] 0.05273925
var(Ler_dispersal_stats$mulog) - mean((Ler_dispersal_stats$se_mulog)^2)
[1] 0.03825842
Ler_dispersal_stats <- mutate(Ler_dispersal_stats,
binom_var = Dispersal_fraction * (1 - Dispersal_fraction) /
Total_seeds)
var(Ler_dispersal_stats$Dispersal_fraction)
[1] 0.03795841
var(Ler_dispersal_stats$Dispersal_fraction) - mean(Ler_dispersal_stats$binom_var)
[1] 0.03740287
The first correction is substantial, the second one minor.
qplot(mulog, sdlog, data = Ler_dispersal_stats)
disp_pca <- princomp(~ mulog + Dispersal_fraction, data = Ler_dispersal_stats)
screeplot(disp_pca)
biplot(disp_pca)
qplot(disp_pca$scores[,1], Ler_dispersal_stats$sdlog)
qplot(disp_pca$scores[,2], Ler_dispersal_stats$sdlog)
summary(lm(Ler_dispersal_stats$sdlog ~ disp_pca$scores[,2]))
Call:
lm(formula = Ler_dispersal_stats$sdlog ~ disp_pca$scores[, 2])
Residuals:
Min 1Q Median 3Q Max
-0.21719 -0.07239 -0.01247 0.06634 0.26884
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.45126 0.01783 25.309 <2e-16 ***
disp_pca$scores[, 2] -0.27682 0.13037 -2.123 0.0407 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.1099 on 36 degrees of freedom
Multiple R-squared: 0.1113, Adjusted R-squared: 0.08661
F-statistic: 4.508 on 1 and 36 DF, p-value: 0.04067
cov(subset(Ler_dispersal_stats, select = c(mulog, sdlog, Dispersal_fraction)))
mulog sdlog Dispersal_fraction
mulog 0.052739247 -0.0086961783 0.0250738618
sdlog -0.008696178 0.0132258602 0.0001367011
Dispersal_fraction 0.025073862 0.0001367011 0.0379584103
cor(subset(Ler_dispersal_stats, select = c(mulog, sdlog, Dispersal_fraction)))
mulog sdlog Dispersal_fraction
mulog 1.0000000 -0.329267986 0.560402749
sdlog -0.3292680 1.000000000 0.006101068
Dispersal_fraction 0.5604027 0.006101068 1.000000000
It looks like there is heteroskedasity (sdlog is larger for high values of PC1, which is low values of mulog), and a weak negative association with PC2.
After repeated nagging from ProjectTemplate, I ran the script to update the project structure to v. 0.8. This changed the doc directory to docs, and added some new variables to global.dcf. Unfortunately the latter don’t seem to be documented…. I’ll also need to make sure that I update the package on my computer at home.
I added a new munge script to calculate the dispersal stats and fits. To prevent this calculating every time, I cached the result; but now I need to add caching to all the munge scripts, and, after running once, turn off caching… Done!